[w3c/manifest] Manifest id should strip fragment at parse time (Issue #1121)

Currently, the `id` member does _not_ have its fragment stripped in the parser, but both places where the spec mentions _comparing_ identities, it uses [exclude fragments](https://url.spec.whatwg.org/#url-equals-exclude-fragments). This strongly suggests that the canonical `id` should not include a fragment, and it's creating implementation problems.

There's even an example of the `id` parsing that leaves the fragment in, which is confusing:

| json["id"] | manifest["start_url"] | manifest["id"] |
| ------------- | ------------- | ---------- |
| undefined | "https://example.com/my-app/#here" | "https://example.com/my-app/#here" |

Since for all intents and purposes, the "#here" is not part of the fragment (it would compare equal to any app id with the same origin, path and query, but a different fragment).

The problem with this is that it encourages user agents to _retain_ the fragment when storing the identity in a database, but then do comparisons ignoring fragment later. However, if the id is used as a database primary key, or hashed before being compared, it can be difficult to correctly recognize two ids as the same app if they differ only by fragment.

**The fragment should be stripped out at parse time, not at comparison time, so they represent the canonical id.**

## Historical analysis

I'm a bit confused because an earlier draft of this text in the original PR #988 included stripping out the fragment at parse time ([discussion](https://github.com/w3c/manifest/pull/988#pullrequestreview-761539027)). But it wasn't in the final version that was merged and I don't see where it got taken out. Note that Chromium [implemented stripping out the fragment at parse time](https://chromium-review.googlesource.com/c/chromium/src/+/3188751), in the same week as that PR, citing the then-draft of the manifest, and still does.

We recently ran into an issue in Chromium (CC @phoglenix) where some old app database entries mysteriously had a fragment in them, which led us to investigate this. Although we can't figure out where those old entries came from (since Chromium always stripped out the fragment), it has highlighted the dangers of having fragments in the URL used as the primary key in a database, when the fragments are supposed to be ignored for comparison purposes.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/w3c/manifest/issues/1121
You are receiving this because you are subscribed to this thread.

Message ID: <w3c/manifest/issues/1121@github.com>

Received on Monday, 6 May 2024 04:34:12 UTC