Re: XSL Errata document updated from Peter B. West on 2002-10-28 (www-xsl-fo@w3.org from October 2002)

From: Peter B. West <pbwest@powerup.com.au>
Date: Tue, 29 Oct 2002 02:50:33 +1000
To: www-xsl-fo@w3.org
Message-ID: <3DBD6AD9.2030005@powerup.com.au>
There has been flurry of well-deserved back-slapping about the first
anniversary of the spec.  Let me add my sincere congratulations.
However, speaking as a Common or Garden Hacker, engaged in a
sanity-threatening struggle to implement this thing...

Éric Bischoff wrote:
(..following from Nokolai Grigoriev..)
   >>The overall impression is that the respective part of the spec
   >>is still underdeveloped: XPath-like expression syntax is mixed
   >>to CSS data types, but no tight integration was made. Data type
   >>definitions in Section 5.11 (copied from CSS) refer to the string
   >>representation of data tokens as if there were no expressions
   >>at all: there is no information about mapping of productions
   >>in 5.9 to property data types.

Not to put too fine a point on this, it's a shambles, and it's getting
worse.  The editors have recently been looking at aspects of <string>,
which floats freely over the spec, descending wherever additional
confusion is required.  The particular question concerned the "format"
property, which has been imported holus-bolus from XSLT, to the extent
that the description of the property simply refers the reader to the
XSLT spec.

The problem is that the expression environment of XSLT is utterly
different to that of XSL-FO.  In XSLT, "format='1.'" is unexceptional;
in FO, it's a number.  In recognition of this fundamental difference, we
now have the DWIM clause.  (Do What I Mean - not what I say, unless
that's also what I mean.)

<quote>
   >>  Given the allowable Expression Value Conversions (section 5.9.12),
   >>  a property value of type <string> must be a quoted value, an NCName,
   >>  or a expression that evaluates to a <string>; anything else (e.g.,
   >>  an integer) is an error.  An implementation may recover from this
   >>  error by treating the unevaluated property value as a string.
   >>
</quote>

For context, here is the definition from 5.11 to which this note is
being added:

<quote>
<string>
       A sequence of characters.
</quote>

Very Bauhaus; but to be fair, a useful definition is (almost) given in:

<quote>
5.9.8 Strings
Strings are represented either as literals or as an enumeration token.
</quote>

Enumeration token?

The DWIM clause is a natural progression from having under-the-hood
conversion of NCNames to <strings> - a user is entitled to feel
aggrieved if an assignment sometimes works without those pointless and
infuriating extra quotes, and sometimes falls over, depending on
content.  Unkind souls (like me) might suggest that a natural REgression
is in order - names is names, and lits is lits, and never the twain
shall meet, so enforce the quotes.

However, there's a new twist to strings in the Errata.

<quote>
The expression language supports operations on a limited set of
datatypes. These do not include <angle>, <time>, and <frequency>. Values
of these datatypes must be strings in the expression language. The
definition of these datatypes specify the allowed form of these strings.
</quote>

Surely all of these number/unit specifiers - scalars, for want of a
better word - should be allowed to take their natural course.  Instead
of an arbitrary and confusing override like the above, specify the full
set of "scalars" and lay out a table of the arithmetic operations that
are allowed and disallowed between them.  Is the intention of the above
that users express these quantities as literals?  Let me guess.  That
leaves the DWIM clause.

It might also be useful in that context to allow numbers to be numbers,
not lengths of unspecified unit to power zero.

   > Even worse : some productions do not match the rest of the 
document. For
   > example, the production for a function does not admit whitespace,
while at
   > several other places whitespace is admitted before '(' or around the
   > arguments.
   >
   >
   >>Therefore, I am inclined to believe that extra quotes shall be
   >>excluded in both integers and URIs. "'url(...)'" is a Literal
   >>whose value is 'url(...)'; it naturally maps to <string> datatype
   >>that is a different datatype fom <uri-specification>.
   >
   >
   > Yes, and that is quite natural: one must have a way to quote data for
them not
   > to be interpreted, as function or as anything else. Saying "'5pt'" is
a way
   > for an user to pass the <string> made of the characters '5' 'p' 't'
and not a
   > <length>. It is very nice to have a quoting process that disables
   > interpretation, it can always be useful.
   >
   >
   >>Unless
   >>a conversion is explicitly permitted by the spec, these two
   >>should be kept separate.
   >
   >
   > Yes.

And yes again.

   > I strongly push forward an in-depth normalization of all the data 
types,
   > functions and syntax stuff, so that :
   > - productions match data types, functions and operators
   > - there is a better separation between lexical analysis and syntax
analysis
   > - data types (the things you can communicate with functions like
   > from-parent()) are clearly distinguished from the initialization
constructs.
   > - what looks like a function is a function, what looks like an
operator is an
   > operator, and so on
   > - perharps we even get rid of the implicit conversions mechanism and
   > explicitly list all the allowed argument types for all functions.
   >
   > Such an in-depth normalization would allow to get rid of the many
complex
   > explanations that currently exist to explain that <angle>s are in fact
   > <string>s and that <percentage>s are in fact <length>s or <number>s
(just to
   > take two examples).
   >
   > Let's take an example to illustrate what I mean. A construct that keeps
   > puzzling me is the <percentage> "data type".
   >
   > <percentage> is described as a data type, therefore one could imagine
that
   > from-parent() could return "50%" for example. But it is specified at
several
   > places that percentages are evaluated first, so one may think it just
as an
   > initialization construct, and pass a <length> or <number> through the
   > functions. Excepted that in a few properties it seems possible to pass
   > percentages (for example background-position-horizontal) according to
the
   > description of the property. So at the end what is it? A real data
type or
   > just an initialization construct?

The specification contradicts itself on percentages.  Percentages are
purportedly evaluated in the process FO tree building, and FO tree
building is a process which is logically precedes area
tree building.  However, most percentage values are defined in relation
to areas, and they cannot, in general, be resolved until the area tree
is being built.  I was like a dog chasing its tail until I finally
abandoned the attempt to resolve such percentages in expressions at the
time of FO tree building.  From my point of view, then, a percentage is
a data type of its own.  This emphasizes the distinction made in the
spec.: a percentage is a relative length.  The other relative length, an
ems value, can always be resolved during tree building.


   > Such a normalization could be for XSL-FO 2.0, as it might introduce 
tiny
   > differences between what was valid and what becomes valid. It would
make the
   > specification much less subtile and much less subject to 
interpretation.
   > Implementers would benefit from it as it would make engines much
simpler and
   > robust, and end users would benefit from it as they would retrieve
widely
   > accepted notions as basic data types.
   >

Peter
-- 
Peter B. West  pbwest@powerup.com.au  http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"
Received on Monday, 28 October 2002 11:50:35 UTC