- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Thu, 27 Sep 2007 06:13:00 +0200
- To: www-archive@w3.org
Hi, (I wrote this many moons ago, it's incomplete and there are a few errors below, but I've been asked to post this somewhere; maybe some- one corrects and completes it. It's the most complete analysis of this kind that I know of.) In the following I will describe the transformation of a CSS Selector into an equivalent XPath expression. This is a top-down process, and it cannot be applied to all Selectors. The exceptions will be pointed out below. Note further that the expressions generated here are not opti- mized in any way, neither of the input selector nor of the resulting XPath expression. A range of possible optimizations can be applied to the result, research material on this matter is readily available. It is assumed that the Selector only uses characters that can occur in the position they are being used in the context of XML documents. It is, for example, possible to construct a selector that matches on elements that include the character U+0001 in their name. This is not allowed in XML documents and as such not in XPath expressions. Even though the transformation described herein does not generate node tests that would be affected by this, the character U+0001 cannot occur in literals in XPath expressions either. In XPath strings are matched case-sensitively; in Selectors, in some cases, strings are matched case-insensitively. Telling the difference requires static knowledge of the artifact that is being matched. It is assumed that such knowledge is not available. Similarily, selectors that rely on information not included in the XPath data model cannot be transformed. This applies to the pseudo-classes :checked, :enabled, :disabled, :target, :focus, :hover, :active, :link, and :visited. It would be possible to define extension functions that allow these to be represented, but this is out of scope of this document and the transformation would be straight-forward in any case. The :...-of-type pseudo-classes cannot be transformed if they are bound to a subject for which the local-name or namespace-name is not known, for example, *|*:nth-of-type(3) cannot be transformed. Naturally, if an implementation evaluates the expression for each node in the tree, it could generate an expression for each particular node, but this is not the design goal in this document. The XPath id() function is used to transform the #id selector; the two specifications have incompatible requirements regarding duplicate IDs. While it would be possible to use other means than id() if IDness could be externally determined, this is considered out of scope. Class selectors are language-specific, in many cases the ... Pseudo-elements do not occur in the XPath data model and as such are ignored by the transformation. A selector with a pseudo-element is transformed into an expression that corresponds to the selector without the pseudo-element. Strings in selectors are assumed to exclude the ' character. The ' character is used to delimit strings in the XPath expressions and as such cannot contain the ' character since unlike Selectors XPath does not support character escapes. It is, however, possible to transform any given string into a concat(...) expression that represents the original expression, so [foo="\"'"] -> [ @foo = concat('"', "'") ]. First, we transform some selectors into equivalent selectors. These reduce the number of transformations to be performed later. It should be noted that this pre-processing generally increases the complexity of the resulting XPath expressions. For example, :only-child is just "count(../*)=1" normally, and would yield a very long expression with this pre-processing applied. As explained above, the processes defined in this document are not optimized. :only-child -> :first-child:last-child :first-child -> :nth-child(1) :last-child -> :nth-last-child(1) :only-of-type -> :first-of-type:last-of-type :first-of-type -> :nth-of-type(1) :last-of-type -> nth-last-of-type(1) We start with "//*" which matches any element and define predicates. ... :root not(parent::*) :nth-child(an+b) :nth-last-child(an+b) x|y:nth-of-type(an+b) x|y:nth-last-of-type(an+b) These are transformed using the following template: ((_a = 0 and (count(_DIR-sibling::*[_Y]) + 1) = _b) or (_a > 0 and not((count(_DIR-sibling::*[_Y]) + 1) < _b) and (((count(_DIR-sibling::*[_Y]) + 1) - _b) mod _a) = 0) or (_a < 0 and not((count(_DIR-sibling::*[_Y]) + 1) > _b) and ((_b - (count(_DIR-sibling::*[_Y]) + 1)) mod -1*_a) = 0)) and parent::* _a and _b are derived from the an+b expression in the selector. _Y is the predicate derived from the type selector x|y in case of :nth-of-type and :nth-last-of-type, and "true()" otherwise. _DIR is "preceding" for :nth-child and :nth-of-type, and "following" for :nth-last-child and :nth-last-of-type. Note that in case of :...-of-type the local name and the name- space name have to be defined. :empty not(* or text()) :not(s) not(self::*[_s]) _s is the predicate derived from s. Care must be taken when parsing s and handling namespaces, the default namespace does not apply to s unless s is a type or universal selector. [x|y] [|y] [y] @*[ namespace-uri() = _x ][ local-name() = _y ] _x is the namespace name associated with the prefix x. If the attribute in the selector is in no namespace, this is ''. _y is the local name y. [*|y] @*[ local-name() = _y ] _y is as specified above. The following transformations add predicates, namespace handling is therefore ignored, as are other syntactic details like use of strings versus identifiers. [x=y] . = _y [x~=y] not(contains(normalize-space(_y), ' ')) and ( . = _y or starts-with(normalize-space(.), concat(_y, ' ')) or contains(normalize-space(.), concat(' ', _y, ' ')) or substring(normalize-space(.), string-length(normalize-space(.)) + 1 - string-length(concat(' ', _y))) = concat(' ', _y)) @@ This actually does not seem right. Perhaps it should be contains(concat(' ', normalize-space(.), ' '), _y) [x|=y] starts-with(., _y) or starts-with(., concat(_y, '-')) [x^=y] starts-with(., _y) [x$=y] substring(., string-length(.) - string-length(_y) + 1) = _y [x*=y] contains(., _y) #y . = //id(_y) x|y * Type selectors and universal selectors are transformed into [ namespace-uri() = _x ][ local-name() = _y ] where _x and _y are the namespace name and local name respectively, and, where undefined, one or both of the predicates are omitted. Even longer ago I wrote a partial implementation of the translation, * http://perl-css.cvs.sourceforge.net/perl-css/CSS-SAC/lib/CSS/SAC/Selector/ToXPath.pm It handles some of the combinators not yet discussed here. regards, -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Thursday, 27 September 2007 04:13:10 UTC