- From: David Sheets <kosmo.zb@gmail.com>
- Date: Sun, 28 Apr 2013 00:54:54 +0100
- To: Anne van Kesteren <annevk@annevk.nl>
- Cc: "www-style@w3.org" <www-style@w3.org>, WebApps WG <public-webapps@w3.org>, Roy Fielding <fielding@gbiv.com>, Larry Masinter <masinter@adobe.com>
Dear Anne, On Thu, Apr 25, 2013 at 12:34 PM, Anne van Kesteren <annevk@annevk.nl> wrote: > Background reading: http://dev.w3.org/csswg/selectors/#local-pseudo > and http://url.spec.whatwg.org/ Local link pseudoselector, as presently specified, seems to be tightly coupled to the absolute URI design of the domain from which the resource is served. This seems quite brittle and will cause weird behavior when resources are inevitably rearranged. Consider the case of a compressed archive with markup documents and CSS files. I believe the web already has several syntaxes for dealing with this and related problems. I propose standards harmonization below. > :local-link() seems like a special case API for doing URL comparison > within the context of selectors. It seems like a great feature, but > I'd like it if we could agree on common comparison rules so that when > we eventually introduce the JavaScript equivalent they're not wildly > divergent. I agree that URI comparison is quite important and powerful. I am very excited to see a consistent and simple comparison specification emerge. > Requests I've heard before I looked at :local-link(): > > * Simple equality > * Ignore fragment > * Ignore fragment and query > * Compare query, but ignore order (e.g. ?x&y will be identical to > ?y&x, which is normally not the case) > * Origin equality (ignores username/password/path/query/fragment) These are all types of pattern-matching. > * Further normalization (browsers don't normalize as much as they > could during parsing, but maybe this should be an operation to modify > the URL object rather than a comparison option) This is a function specification. > :local-link() seems to ask for: Ignore fragment and query and only > look at a subset of path segments. However, :local-link() also ignores > port/scheme which is not typical. We try to keep everything > origin-scoped (ignoring username/password probably makes sense). > Furthermore, :local-link() ignores a final empty path segment, which > seems to mimic some popular server architectures (although those > ignore most empty path segments, not just the final), but does not > match URL architecture. Fundamentally, comparison is about structural pattern-matching. As it happens, the WWW already has an incredibly widespread syntax which internally performs pattern-matching: relative URI references. (aside: mathematically, relative URI refs are hylomorphic function specifications on otherwise opaque identifier strings) To that end, I propose a factorization of the tightly coupled parsing specification in WHATWG URL resulting in 3 separate functions: 1. parsing 2. normalization 3. relative reference resolution Once this specification is properly factored, discussion, specification and implementation of the pattern-matching semantics of :local-link() and equivalent JavaScript functionality becomes much easier. This is elucidated in my proposal below. > For JavaScript I think the basic API will have to be something like: > > url.equals(url2, {query:"ignore-order"}) > url.equals(url2, {query:"ignore-order", upto:"fragment"}) // ignores fragment > url.equals(url2, {upto:"path"}) // compares everything before path, > including username/password > url.origin == url2.origin // ignores username/password > url.equals(url2, {pathSegments:2}) // implies ignoring query/fragment PROPOSAL I believe the primary objective of this work is to define a syntax for URI patterns. There are presently 2 different but compatible standard URI pattern syntaxes: 1. RFC 3986 Relative URI references 2. RFC 6570 <http://tools.ietf.org/html/rfc6570> URI templates Through re-use of these pattern syntaxes, the :local-link() pseudoselector gains incredible flexibility, expressivity, consistency, and (to my eye) simplicity. I will use the notation [path] | [pattern] for pattern matching. I haven't worked out *all* the details yet, but here are some possible patterns to get us started: "" = own-document links "." = this document's path ("/foo/bar/baz" | "." = "/foo/bar/" and "/foo/bar/" | "." = "/foo/bar/" | "") "./" = this document's path or deeper (includes path-sibling resources and self) ".." = this document's parent path ("/foo/bar/" | ".." = "/foo/") "../" = this document's parent path or deeper (includes aunts/uncles and self) "../." = ".." "/" = this domain (':local-link(0)') "https:///" = HTTPS resources on this domain Now, at this point, you may say "But, David, this syntax can't even express the current :local-link() examples!" and you would be right. However, this syntax can easily express links to resources *relative* to the present one which is a crucial feature for any URI pattern matching system. Let's now consider the extension of this syntax with the syntax of RFC 6570. In particular, the construct we appear to be missing from the URI reference syntax is *binding*. RFC 6570 concerns URI construction but we can easily envision using the same syntax for the inverse of construction, destruction. Because we do not require to bind structural elements into a local environment, I propose the adoption of "{}" as the self-match syntax (not yet allowed under RFC 6570 but could easily be a constructor no-op) and "{_}" as the wildcard match (though "{comment}" could be used for commentary or when porting patterns from systems which *do* bind into environments): "{}" = "" = self "{_}" = "./{_}" = siblings or self (but not deeper; "/foo/bar/" | "{_}" = "/foo/bar/{any single path segment}") "{}/" = any descendant of this document ("/foo/bar" | "{}/" = "/foo/bar/{anything}" and "/foo/bar/" | "{}/" = "/foo/bar//{anything}") "{_}/" = anything with a deeper same-prefix path as this document ("/foo/bar/" | "{_}/" = "/foo/bar/{anything}") "{}/{_}" = any nieces/nephews/children of this document ("/foo/bar/" | "{}/{_}" is "/foo/bar/{any}/{other}") "/{}" = the resource with the identity of the first path segment equivalent to this document "/{_}" = any first-level resource "/{}/" = same first path segment (':local-link(1)') "/{_}/" = any resource at least 1 level deep "/~{_}/" = any resource with first path segment beginning with "~" All of these patterns ignore the fragment (it indicates a delegated resource subordinate to the primary resource) and require the same username/password. None of these patterns match URIs with query strings (their semantics are totally dependent on the server). If you wish to match URIs with query strings, the syntax is simple: "{?}" = self with same query string (incl. none) "{?_}" = self with any query string (incl. none) "{}/{?}" = any descendant with same query string (incl. none) "{}/{?_}" = any descendant with any query string (incl. none) Perhaps you wish to style only certain same-document references: "#defn-{_}" = any same-document reference to fragments beginning with "defn-" I understand these semantics are not trivial to implement and I have begun a prototype implementation in my URI library. The benefit of reusing the syntax and semantics of relative URI references and URI templates are manifest. Structural pattern matching (or, in this case, relative URI reference predicates) is incredibly powerful as I hope I have demonstrated. Additionally, this design leverages knowledge and important design constraints found in these other specifications and their users. I have not yet devised algebraic solutions for query string permutation or URI normalization. Disjunction of patterns is possible through normal CSS selector disjunction. A major barrier to widespread deployment of a system of this kind is the WHATWG URL specification. By unnecessarily coupling parsing, normalization, and relative reference resolution, implementations conforming to only the WHATWG URL specification cannot offer developers control over the level and type of normalization nor the ability to manipulate relative URIs without resolving them. Humanity deserves a better foundation on which to construct algebras over its global namespace. As for speedy deployment, I would rather start on the path toward correct, consistent, and powerful pattern matching than see something rushed into standards due to feature anxiety. 3 or 6 more months to get this language right is a constant factor on a potentially unbounded technology lifetime. I hope you've found this design proposal stimulating and I warmly welcome any and all constructive (or destructive) response. Happy Holidays, David Sheets
Received on Sunday, 28 April 2013 00:06:33 UTC