Re: Standardizing Firefox's Implementation of Link Fingerprints from Edward Lee on 2007-07-03 (ietf-http-wg@w3.org from July to September 2007)

From: Edward Lee <edilee@mozilla.com>
Date: Mon, 2 Jul 2007 18:32:05 -0700
To: "Eric Lawrence" <ericlaw@exchange.microsoft.com>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <dc07ed930707021832m2acb0deav78ebef747ae3fdc@mail.gmail.com>

On 7/2/07, Eric Lawrence <ericlaw@exchange.microsoft.com> wrote:
> - The definition of this behavior across "all" media types is a significant departure from the XPointer syntax registered for XML fragments, and does present some level of compatibility concern.
Are you referring to #hash() potentially taking up the fragment
identifier space of a MIME-type? I.e., a MIME-type might want to
define its own fragment identifier starting with "hash("? Hopefully
this won't be an issue in the future if other fragment identifiers
follow similar syntax as xpointer() and hash(). But currently the only
"major" conflict is with text/html, which will have user-agents trying
to jump to a "hash()" identifier (not that xhtml 'id's allow parens
and colons).

> - Does this mechanism permit use of the "hash fragment" alongside the Media-Type specific fragment (that is to say, can I have a hyperlink which contains both a Hash, and a HTML anchor ID)?
There is potential for conflict with the fragment identifier in that
it has to be either Link Fingerprints or the MIME-type specific
fragment identifier. Currently, the main uses of the fragment
identifier is for html, xml, pdf, (and potentially text/plain [1]).
The main use case of Link Fingerprints is for checking content that is
not displayed by the web browser, i.e., file downloads. However, this
edge case can occur, and the link provider would need to make a
decision to use the original fragment identifier or Link Fingerprints.

There were previous discussions when designing Link Fingerprints about
allowing multiple fragment identifiers in a URI, but that would either
require changing the URI Generic Syntax or defining the "multiple
fragment identifier" syntax for "all media types" as well.

> - The draft may benefit from a more explicit statement of the algorithm (for at least HTTP) that specifies the steps taken in the process of evaluation of the hash.  For HTTP, I'd imagine that this includes removing any HTTP Compression (gzip/deflate) and Transfer-Encoding, before hashing the resulting body?
Right. The bits used to compute the hash would be the bits just before
being made available to the user. The implementation for Firefox
basically uses the same gzip->uncompressed interface to "convert" the
Link Fingerprinted data on-the-fly (because hashes can be computed
incrementally) and return an error code at the end of data transfer if
the fingerprint does not match. The Link Fingerprints converter had to
be placed correctly after the decompressing and before the consumer of
the data.

> - What is the expected behavior of a user-agent when it follows a redirection that returns, for instance, a 404 page?  The hash will obviously not be valid at that point.
The 404 page would be treated as if it was the result of the download,
so the user-agent would indicate that there was a Link Fingerprint
failure. (Unless the link provider wanted to put a Link Fingerprint
for the 404 page..)

> - In order for the hashing algorithm to be effective, the entire body must first be retrieved, correct?  This could significantly change the behavior of a user-agent rendering the response.
Right. Again, because this was designed for file downloads, Firefox's
network layer will make the Link Fingerprinted data available to the
higher level consumers (e.g., download manager or page displaying
engine) so that progress bars are updated correctly and pages can
incrementally display. If some content is to be displayed in the
user-agent and not saved to a file, it would need to retract the data
already shown (e.g., an image that partially loaded would instead show
a broken image). An open issue with the current implementation is that
scripts for a HTML page could redirect away before the original page
finishes loading to compute the hash.

Ed

[1] http://www.ietf.org/internet-drafts/draft-wilde-text-fragment-06.txt

Received on Tuesday, 3 July 2007 01:32:20 UTC