W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2011

Re: draft-bryan-metalinkhttp-18.txt

From: Anthony Bryan <anthonybryan@gmail.com>
Date: Wed, 19 Jan 2011 04:33:23 -0500
Message-ID: <AANLkTimF6bmz=eQ8bevJG_j1gui+ZQKwAyDQOSwcng6D@mail.gmail.com>
To: Julian Reschke <julian.reschke@gmx.de>
Cc: ietf-http-wg@w3.org, draft-bryan-metalinkhttp@tools.ietf.org
hi Julian. thanks again for your time & help.

On Sun, Jan 16, 2011 at 2:37 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 25.09.2010 16:29, Peter Pöml wrote:
>>
>> Hi,
>>
>> a few thoughts regarding version 18 of
>> http://tools.ietf.org/html/draft-bryan-metalinkhttp
>> ...
>
> Below is my feedback (nothing serious, I'd say):
>
> Abstract
>
>   This document specifies Metalink/HTTP: Mirrors and Cryptographic
>   Hashes in HTTP Headers, a different way to get information that is
>
> ...header fields... (yes, bean counting...)

changed.

>   usually contained in the Metalink XML-based download description
>   format.  Metalink/HTTP describes multiple download locations
>   (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures,
>   and other information using existing standards for HTTP headers.
>   Clients can transparently use this information to make file transfers
>   more robust and reliable.
>
> maybe strike "transparently"?

ok.

>   ...
>
> 1.  Introduction
>
>   Metalink/HTTP is an alternative representation of Metalink
>   information, which is usually presented as an XML-based document
>   format [RFC5854].  Metalink/HTTP attempts to provide as much
>   functionality as the Metalink/XML format by using existing standards
>   such as Web Linking [RFC5988], Instance Digests in HTTP [RFC3230],
>   and ETags [RFC2616].  Metalink/HTTP is used to list information about
>
> You may want to expand ETags once to "Entity Tags".

of course, oops.

>   a file to be downloaded.  This can include lists of multiple URIs
>   (mirrors), Peer-to-Peer information, cryptographic hashes, and
>   digital signatures.
>
>   ...
>
>   This document describes a mechanism by which the benefit of mirrors
>   can be automatically and more effectively realized.  All the
>   information about a download, including mirrors, cryptographic
>   hashes, digital signatures, and more can be transferred in
>   coordinated HTTP Headers.  This Metalink transfers the knowledge of
>
> Incomplete sentence? "This Metalink transfers..." Or maybe state once that
> you call the thing described before "Metalink".

expanded the sentence before with "hereafter referred to as a Metalink." :

All the information about a download, including mirrors, cryptographic
hashes, digital signatures, and more can be transferred in coordinated
HTTP header fields hereafter referred to as a Metalink. This Metalink
transfers the knowledge of the download server (and mirror database)
to the client.

>   the download server (and mirror database) to the client.  Clients can
>   fallback to other mirrors if the current one has an issue.  With this
>   knowledge, the client is enabled to work its way to a successful
>   download even under adverse circumstances.  All this is done
>   transparently to the user and the download is much more reliable and
>
> Doing it totally transparently might be a problem with respect to user
> privacy. Maybe the user doesn't *want* to hit a particular server?

Alexey suggested a change too. it now says:

"All this can be done without complicated user interaction and the
download can be much more reliable and efficient."

>   ...
>
>   [[ Discussion of this draft should take place on IETF HTTP WG mailing
>   list at ietf-http-wg@w3.org or the Metalink discussion mailing list
>   located at metalink-discussion@googlegroups.com.  To join the list,
>   visit http://groups.google.com/group/metalink-discussion . ]]
>
> This should go on the front page as "Editorial Note".

in <abstract>?

I've moved it to there where it says:

[[ Discussion of this draft should take place on IETF HTTP WG mailing
list at ietf-http-wg@w3.org although this draft is not a WG item. ]]

>   ...
>
> 1.2.  Examples
>
>   A brief Metalink server response with ETag, mirrors, .metalink,
>   OpenPGP signature, and a cryptographic hash of the whole file:
>
>   Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
>   Link: <http://www2.example.com/example.ext>; rel="duplicate"
>   Link: <ftp://ftp.example.com/example.ext>; rel="duplicate"
>   Link: <http://example.com/example.ext.torrent>; rel="describedby";
>   type="application/x-bittorrent"
>   Link: <http://example.com/example.ext.metalink>; rel="describedby";
>   type="application/metalink4+xml"
>   Link: <http://example.com/example.ext.asc>; rel="describedby";
>   type="application/pgp-signature"
>   Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
>   DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==
>
> Note: there's no need quote the relation type here.

here or throughout? they are quoted in the RFC 5988 examples.

   Link: <http://example.com/TheBook/chapter2>; rel="previous";
         title="previous chapter"

or the MIME type in the type parameter doesn't need to be quoted?

> Note: it's unfortunate that there doesn't seem to be a registered media type
> for torrent files.

true. I'd fix that if I could. :)

>   ...
>
>   Metalink resources include a Link header [RFC5988] to present a list
>   of mirrors in the response to a client request for the resource.
>   Metalink servers MUST include the cryptographic hash of a resource
>   via Instance Digests in HTTP [RFC3230].  Valid algorithms are found
>   in the IANA registry named "Hypertext Transfer Protocol (HTTP) Digest
>   Algorithm Values" at
>   http://www.iana.org/assignments/http-dig-alg/http-dig-alg.xhtml .
>
> Surplus whitespace. Maybe put the URI into angle brackets.

I've also seen registries cited in the references. is that better?

Valid algorithms are found in the IANA registry named "Hypertext
Transfer Protocol (HTTP) Digest Algorithm Values" at
<http://www.iana.org/assignments/http-dig-alg/http-dig-alg.xhtml>.

>   Metalink servers are HTTP servers with one or more Metalink
>   resources.  Metalink servers MUST support the Link header for listing
>   mirrors and MUST support Instance Digests in HTTP [RFC3230].  Mirror
>   and cryptographic hash information provided by the originating
>   Metalink server MUST be considered authoritative.  Metalink servers
>
> I have a problem with "MUST be considered authorative" when there's no clear
> explanation what "authorative". Is this really a conformance requirement?

removed

>   and their associated mirror servers are RECOMMENDED to all share the
>
> I think it's easier to read when you use SHOULD instead of RECOMMENDED (this
> applies to many more places).

ack

>   same ETag policy (ETag Synchronization), i.e. based on the file
>   contents (cryptographic hash) and not server-unique filesystem
>   metadata.  The emitted ETag MAY be implemented the same as the
>
> As the term "Etag policy" is important, it might make sense to introduce it
> more formally.

"Metalink servers and their associated mirror servers SHOULD all share
the same ETag policy, meaning ETags are synchronized across servers,
i.e. byte-for-byte identical files will have the same ETag on all
servers. ETags could be based on the file contents (cryptographic
hash) and not server-unique filesystem metadata."

>   Instance Digest for simplicity.  Metalink servers MAY offer Metalink/
>   XML documents that contain cryptographic hashes of parts of the file
>   and other information.
>
> I'd change "MAY" to "can" here and in many more places. Use MAY (and
> friends) only to explain conformance requirements.

will do.

>   ...
>
>   Metalink clients use the mirrors provided by a Metalink server with
>   Link header [RFC5988].  Metalink clients MUST support HTTP and are
>
> "with *a* link header"?

yes, oops.

>   ...
>
>   A brief Metalink server response with two mirrors only:
>
>   Link: <http://www2.example.com/example.ext>; rel="duplicate";
>   pri=1; pref=1
>   Link: <ftp://ftp.example.com/example.ext>; rel="duplicate";
>   pri=2; geo="gb"; depth=1
>
>   [[Some organizations have many mirrors.  Only send a few mirrors, or
>   only use the Link header if Want-Digest is used?]]
>
> RFC 5988 really doesn't say who can define extension parameters. It probably
> should. Mark?
>
>   ...
>
>   This is purely an expression of the server's preferences; it is up to
>   the client what it does with this information, particularly with
>   reference to how many servers to use at any one time.  A client MUST
>   respect the server's priority ordering, however.
>
> What does it mean to "respect" it? Why is this a MUST?

good question! conflicting huh? removed.

>   ...
>
>   Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2
>
> "Entries for a mirror server can have..."

changed.

>   ...
>
>   There are two types of mirror servers: preferred and normal.
>   Preferred mirror servers are HTTP mirror servers that MUST share the
>   same ETag policy as the originating Metalink server.  Preferred
>   mirrors make it possible to detect early on, before data is
>   transferred, if the file requested matches the desired file.
>
> Note: that also could be achieved by introducing a new conditional header
> for the digest, or by using the extension points in the WebDAV "If" header.

I don't know much about WebDAV but I read section 10.4 of RFC 4918. I
still don't understand.

we use If-Match in the examples. is that wrong?

>   Preferred HTTP mirror servers have a "pref" value of 1.  By default,
>   if unspecified then mirrors are considered "normal" and do not
>   necessarily share the same ETag policy.  FTP mirrors, as they do not
>   emit ETags, are considered "normal". ([draft-ietf-ftpext2-hash]
>   allows for FTP mirrors to be coordinated and provide file hashes).
>
> If you need only "1", then this parameter may not need a value at all (the
> grammar allows that).

oh, ok. yes, only 1.

>   ...
>
>   [[Suggestion: In order for clients to identify servers that have
>   coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g.
>
>   ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5="
>
>   ]]
>
> I think this may be hard to deploy (changing etags as opposed to adding
> metadata).

this was removed.

>   ...
>
>   Mirror servers MAY have a "depth" value, where "depth=0" is the
>   default.  A value of 0 means ONLY that file is mirrored.  A value of
>   1 means that file and all other files and subdirectories in the
>   directory are mirrored.  A value of 2 means the directory above, and
>   all files and subdirectories, are mirrored.  For each higher value,
>   another directory closer to the root is mirrored.
>
> This probably should be rephrased in terms of URI path segments.

first attempt:

"Mirror servers MAY have a "depth" value, where "depth=0" is the
default. A value of 0 means ONLY that file is mirrored and that other
URI path segments are not. A value of 1 means that file and all other
files and URI path segments contained in the rightmost URI path
segment are mirrored. For values of N, you go up N-1 URI path segments
above. A value of 2 means means going up one URI path segment above,
and all files and URI path segments contained are mirrored. For each
higher value, another URI path segment closer to the Host is
mirrored."

>   ...
>
> 6.  Cryptographic Hashes of Whole Files
>
>   Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for
>   files they describe with mirrors.  Mirror servers SHOULD as well.
>
> Thus, when the Digest is missing, the Link headers should be ignored? must?

If Instance Digests are not provided by the Metalink servers, the Link
header fields MUST be ignored.

>
>   ...
>
> 7.  Client / Server Multi-source Download Interaction
>
>   Metalink clients begin a download with a standard HTTP [RFC2616] GET
>   request to the Metalink server.  A Range limit is optional, not
>   required.  Alternatively, Metalink clients can begin with a HEAD
>   request to the Metalink server to discover mirrors via Link headers.
>
> So returning them on HEAD is REQUIRED, right?

yes, added this to the requirements section:

Metalink servers MUST return the same Link header fields and Instance
Digests on HEAD requests.

>   ...
>   Downloads from mirrors that do not have the same file size as the
>   Metalink server MUST be aborted.
>
> It this is a MUST it needs more details (closing the connection?). Otherwise
> simply state that such as response needs to be considered unusable and leave
> it to the client how to deal with it.

"Downloads from mirrors that do not have the same file size as the
Metalink server are considered unusable and the client can deal with
it as it sees fit. "

>   ...
>   Once the download has completed, the Metalink client MUST verify the
>   cryptographic hash of the file.
>
> And then do what if it fails?

expanded this to:

Once the download has completed, the Metalink client MUST verify the
cryptographic hash of the file. If the cryptographic hash offered by
the Metalink server with Instance Digests does not match the
cryptographic hash of the downloaded file, see Section 7.1.2 for a
possible way to repair errors.

If the download can not be repaired, it is considered corrupt. The
client can attempt to re-download the file.

>   ...
>   The size of chunks chosen by the client should be sufficiently large
>   that the chunk request headers and reponse headers represent neglible
>   overhead, and sufficiently large that they can be pipelined
>   effectively without needing a very high rate of chunk requests.  At
>   the same time, the amount of time wasted waiting for the last chunk
>   to download from the last server after all the other servers have
>   finished should be minimized.  Thus we currently recommend that a
>   chunk size of at least 10KBytes should be used.  If the file being
>   transfered is very large, or the download speed very high, this can
>   be increased to perhaps 1MByte.  As network bandwidths increase, we
>   expect these numbers to increase appropriately, so that the time to
>   transfer a chunk remains significantly larger than the latency of
>   requesting a chunk from a server.
>
> Wow. This appears to ignore the overhead of Range requests on the *server*.
> Note that sometimes, content is not served directly from the filesystem, and
> implementing Range may not be possible using seeks. Now one could argue that
> servers suffering from the problem should not support this in the first
> place, but still...
>
> Given the file sizes for which parallel downloads make any sense today, is
> it *really* a good idea to recommend 10K segments?

nope. how about removing the last 3 sentences & adding
"Note that Range requests impose an overhead on servers and clients
need to be aware of that and not abuse them. "

> 9.  IANA Considerations
>
>   Accordingly, IANA has made the following registration to the Link
>   Relation Type registry.
>
>   ...
>
> No, they haven't yet :-) Just state what they are supposed to do, the RFC
> Editor will rephrase this on publication anyway.

yes :)

thanks for the thorough review!

-- 
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads
Received on Wednesday, 19 January 2011 09:33:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:51:36 GMT