RFC on Fragment Directives

Hello all,

I'd like to get some broader feedback on the proposal of a "fragment
directive". The basic idea is to encode a section of the URL fragment for
"UA instructions". e.g.

https://example.org#fragment##fragment-directive

The idea is that the UA would process the fragment directive and strip it
from the URL during HTML document loading (to prevent interfering with
application script). The motivation for this is the ScrollToTextFragment
<https://github.com/WICG/ScrollToTextFragment/> proposal and there's more
precise details on how this would work in the WHATWG issue
<https://github.com/whatwg/url/issues/445> I filed.

Most of logic here would be changes to HTML loading but there's some
implications for URL parsing. We need some way of delimiting the fragment
directive from the rest of the fragment. We've currently proposed using
'##'. This makes it ergonomic to use since a fragment directive can be
specified without a fragment:

https://example.org##fragment-directive.

and non-implementing UAs would likely treat this as a regular fragment. The
catch is that this fails strict validation today since '#' is not a valid
code point <https://url.spec.whatwg.org/#url-writing>inside a fragment;
we'd have to change the URL spec to allow '#' inside a fragment. This is
helpful in that we're far less likely to encounter and break existing web
content that's using '##' because it's invalid today.

The downside is that existing validators and tools might
misinterpret/misparse these kinds of URLs. I've tried a number of tools
(major browsers, web apps, etc.) and they all appear to be  more permissive
than the spec. However, validators strictly adhering to the spec will fail
on these URLs (I've found some online examples
<https://quuz.org/url/uri-validate.html>) so there is some non-trivial
compat risk.

As an alternative, we're evaluating whether we could find alternative
delimiters that are valid code points, e.g. '@@' htps://
example.org#@@fragment-directive. These would be less ergonomic since we'd
always have to add a fragment to the URL but they pose less compat risk to
existing URL validation. The flip side is that they pose more web-compat
risk since existing web pages could have these delimiters in their
fragments. We're working on getting some concrete compat data for some
candidate delimiters.

Does anyone in this community have thoughts on this proposal? Specifically,
is there a good way to concretely evaluate how much risk there is to
changing URL parsing. Is there any precedent we could look at?

Thanks,
David

Received on Thursday, 5 September 2019 16:37:54 UTC