- From: Peter B. West <pbwest@powerup.com.au>
- Date: Tue, 29 Oct 2002 02:52:12 +1000
- To: xsl-editors <xsl-editors@w3.org>
There has been flurry of well-deserved back-slapping about the first anniversary of the spec. Let me add my sincere congratulations. However, speaking as a Common or Garden Hacker, engaged in a sanity-threatening struggle to implement this thing... Éric Bischoff wrote: (..following from Nokolai Grigoriev..) >>The overall impression is that the respective part of the spec >>is still underdeveloped: XPath-like expression syntax is mixed >>to CSS data types, but no tight integration was made. Data type >>definitions in Section 5.11 (copied from CSS) refer to the string >>representation of data tokens as if there were no expressions >>at all: there is no information about mapping of productions >>in 5.9 to property data types. Not to put too fine a point on this, it's a shambles, and it's getting worse. The editors have recently been looking at aspects of <string>, which floats freely over the spec, descending wherever additional confusion is required. The particular question concerned the "format" property, which has been imported holus-bolus from XSLT, to the extent that the description of the property simply refers the reader to the XSLT spec. The problem is that the expression environment of XSLT is utterly different to that of XSL-FO. In XSLT, "format='1.'" is unexceptional; in FO, it's a number. In recognition of this fundamental difference, we now have the DWIM clause. (Do What I Mean - not what I say, unless that's also what I mean.) <quote> >> Given the allowable Expression Value Conversions (section 5.9.12), >> a property value of type <string> must be a quoted value, an NCName, >> or a expression that evaluates to a <string>; anything else (e.g., >> an integer) is an error. An implementation may recover from this >> error by treating the unevaluated property value as a string. >> </quote> For context, here is the definition from 5.11 to which this note is being added: <quote> <string> A sequence of characters. </quote> Very Bauhaus; but to be fair, a useful definition is (almost) given in: <quote> 5.9.8 Strings Strings are represented either as literals or as an enumeration token. </quote> Enumeration token? The DWIM clause is a natural progression from having under-the-hood conversion of NCNames to <strings> - a user is entitled to feel aggrieved if an assignment sometimes works without those pointless and infuriating extra quotes, and sometimes falls over, depending on content. Unkind souls (like me) might suggest that a natural REgression is in order - names is names, and lits is lits, and never the twain shall meet, so enforce the quotes. However, there's a new twist to strings in the Errata. <quote> The expression language supports operations on a limited set of datatypes. These do not include <angle>, <time>, and <frequency>. Values of these datatypes must be strings in the expression language. The definition of these datatypes specify the allowed form of these strings. </quote> Surely all of these number/unit specifiers - scalars, for want of a better word - should be allowed to take their natural course. Instead of an arbitrary and confusing override like the above, specify the full set of "scalars" and lay out a table of the arithmetic operations that are allowed and disallowed between them. Is the intention of the above that users express these quantities as literals? Let me guess. That leaves the DWIM clause. It might also be useful in that context to allow numbers to be numbers, not lengths of unspecified unit to power zero. > Even worse : some productions do not match the rest of the document. For > example, the production for a function does not admit whitespace, while at > several other places whitespace is admitted before '(' or around the > arguments. > > >>Therefore, I am inclined to believe that extra quotes shall be >>excluded in both integers and URIs. "'url(...)'" is a Literal >>whose value is 'url(...)'; it naturally maps to <string> datatype >>that is a different datatype fom <uri-specification>. > > > Yes, and that is quite natural: one must have a way to quote data for them not > to be interpreted, as function or as anything else. Saying "'5pt'" is a way > for an user to pass the <string> made of the characters '5' 'p' 't' and not a > <length>. It is very nice to have a quoting process that disables > interpretation, it can always be useful. > > >>Unless >>a conversion is explicitly permitted by the spec, these two >>should be kept separate. > > > Yes. And yes again. > I strongly push forward an in-depth normalization of all the data types, > functions and syntax stuff, so that : > - productions match data types, functions and operators > - there is a better separation between lexical analysis and syntax analysis > - data types (the things you can communicate with functions like > from-parent()) are clearly distinguished from the initialization constructs. > - what looks like a function is a function, what looks like an operator is an > operator, and so on > - perharps we even get rid of the implicit conversions mechanism and > explicitly list all the allowed argument types for all functions. > > Such an in-depth normalization would allow to get rid of the many complex > explanations that currently exist to explain that <angle>s are in fact > <string>s and that <percentage>s are in fact <length>s or <number>s (just to > take two examples). > > Let's take an example to illustrate what I mean. A construct that keeps > puzzling me is the <percentage> "data type". > > <percentage> is described as a data type, therefore one could imagine that > from-parent() could return "50%" for example. But it is specified at several > places that percentages are evaluated first, so one may think it just as an > initialization construct, and pass a <length> or <number> through the > functions. Excepted that in a few properties it seems possible to pass > percentages (for example background-position-horizontal) according to the > description of the property. So at the end what is it? A real data type or > just an initialization construct? The specification contradicts itself on percentages. Percentages are purportedly evaluated in the process FO tree building, and FO tree building is a process which is logically precedes area tree building. However, most percentage values are defined in relation to areas, and they cannot, in general, be resolved until the area tree is being built. I was like a dog chasing its tail until I finally abandoned the attempt to resolve such percentages in expressions at the time of FO tree building. From my point of view, then, a percentage is a data type of its own. This emphasizes the distinction made in the spec.: a percentage is a relative length. The other relative length, an ems value, can always be resolved during tree building. > Such a normalization could be for XSL-FO 2.0, as it might introduce tiny > differences between what was valid and what becomes valid. It would make the > specification much less subtile and much less subject to interpretation. > Implementers would benefit from it as it would make engines much simpler and > robust, and end users would benefit from it as they would retrieve widely > accepted notions as basic data types. > Peter -- Peter B. West pbwest@powerup.com.au http://www.powerup.com.au/~pbwest/ "Lord, to whom shall we go?"
Received on Monday, 28 October 2002 11:51:57 UTC