- From: Edward Lee <edilee@mozilla.com>
- Date: Mon, 2 Jul 2007 18:32:05 -0700
- To: "Eric Lawrence" <ericlaw@exchange.microsoft.com>
- Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
On 7/2/07, Eric Lawrence <ericlaw@exchange.microsoft.com> wrote: > - The definition of this behavior across "all" media types is a significant departure from the XPointer syntax registered for XML fragments, and does present some level of compatibility concern. Are you referring to #hash() potentially taking up the fragment identifier space of a MIME-type? I.e., a MIME-type might want to define its own fragment identifier starting with "hash("? Hopefully this won't be an issue in the future if other fragment identifiers follow similar syntax as xpointer() and hash(). But currently the only "major" conflict is with text/html, which will have user-agents trying to jump to a "hash()" identifier (not that xhtml 'id's allow parens and colons). > - Does this mechanism permit use of the "hash fragment" alongside the Media-Type specific fragment (that is to say, can I have a hyperlink which contains both a Hash, and a HTML anchor ID)? There is potential for conflict with the fragment identifier in that it has to be either Link Fingerprints or the MIME-type specific fragment identifier. Currently, the main uses of the fragment identifier is for html, xml, pdf, (and potentially text/plain [1]). The main use case of Link Fingerprints is for checking content that is not displayed by the web browser, i.e., file downloads. However, this edge case can occur, and the link provider would need to make a decision to use the original fragment identifier or Link Fingerprints. There were previous discussions when designing Link Fingerprints about allowing multiple fragment identifiers in a URI, but that would either require changing the URI Generic Syntax or defining the "multiple fragment identifier" syntax for "all media types" as well. > - The draft may benefit from a more explicit statement of the algorithm (for at least HTTP) that specifies the steps taken in the process of evaluation of the hash. For HTTP, I'd imagine that this includes removing any HTTP Compression (gzip/deflate) and Transfer-Encoding, before hashing the resulting body? Right. The bits used to compute the hash would be the bits just before being made available to the user. The implementation for Firefox basically uses the same gzip->uncompressed interface to "convert" the Link Fingerprinted data on-the-fly (because hashes can be computed incrementally) and return an error code at the end of data transfer if the fingerprint does not match. The Link Fingerprints converter had to be placed correctly after the decompressing and before the consumer of the data. > - What is the expected behavior of a user-agent when it follows a redirection that returns, for instance, a 404 page? The hash will obviously not be valid at that point. The 404 page would be treated as if it was the result of the download, so the user-agent would indicate that there was a Link Fingerprint failure. (Unless the link provider wanted to put a Link Fingerprint for the 404 page..) > - In order for the hashing algorithm to be effective, the entire body must first be retrieved, correct? This could significantly change the behavior of a user-agent rendering the response. Right. Again, because this was designed for file downloads, Firefox's network layer will make the Link Fingerprinted data available to the higher level consumers (e.g., download manager or page displaying engine) so that progress bars are updated correctly and pages can incrementally display. If some content is to be displayed in the user-agent and not saved to a file, it would need to retract the data already shown (e.g., an image that partially loaded would instead show a broken image). An open issue with the current implementation is that scripts for a HTML page could redirect away before the original page finishes loading to compute the hash. Ed [1] http://www.ietf.org/internet-drafts/draft-wilde-text-fragment-06.txt
Received on Tuesday, 3 July 2007 01:32:20 UTC