RE: [xsl] Using or ignoring Types in XSLT 2.0 / XPath 2.0 from Kay, Michael on 2003-05-15 (public-qt-comments@w3.org from May 2003)

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Thu, 15 May 2003 06:10:37 +0200
To: Mike Haarman <mhaarman@infinitecampus.org>, public-qt-comments@w3.org
Message-ID: <DFF2AC9E3583D511A21F0008C7E62106073DCE39@daemsg02.software-ag.de>
> I recognize that the typing debate is quickly approaching 
> perma-thread status on XML-DEV and I did not intend to start 
> that pillow fight over here.  Better and better informed 
> minds than mine have been chewing on this for some time, but 
> now is the time to make my voice heard, if it is to be heard 
> at all, and this reply has been cross posted to 
> public-qt-comments@w3.org.

I will avoid cross-posting the result. In fact, I won't reply on xsl-list at
all, since temperature are often cooled better by saying nothing. You can
always point people to this archived response if you wish.
> 
> > I still contend that type doesn't belong in XSLT, but if it is in 
> > there, it should make processes more efficient, not less. If type 
> > needs to be there, then all of XSD should be supported, 
> such that an 
> > XSLT function can return an object of complex type Foo.
> 
> I concur.

It's a shame that the paragraph you quote states an opinion, without giving
any reasons why the author holds that view. I'm not sure what kind of
processes the author is referring to, or how he/she measures its efficiency.
The XSLT/XPath/XQuery model does support all of XML Schema, and does indeed
enable an XSLT function to return an object of a complex type.

  Two points to note:
> 
> 1) A range of XPath 2 functions display indeterminate 
> behavior in the absence of PSVI type annotation.

I don't know what you mean by this. Could you please elaborate? I think that
all the behavior is well-defined, except in a few quite particular cases
where we have chosen to make it implementation-defined (e.g. collating
sequences).

> I believe 
> this practically voids their utility for a very large set of 
> XML applications, that is: web-facing ones.

I think this conclusion is based on a false premise.

> Validation 
> remains far too expensive for non-trivial network 
> applications.  We validate coming in, but we can't afford to 
> validate going out.  Validation is a useful tool, but a 
> glaring inefficiency.

This is an interesting observation. For some application scenarios, you are
probably right. But there are also potential performance gains: for example,
it should mean that if any data item is used more than once in the
transformation, it is only validated and converted to its target data type
once, rather than being validated and converted each time it is used.

I suspect that the biggest part of validation cost is the cost of compiling
the schema into a form that constitutes an executable validator. It should
be possible to do this once, in the same way as stylesheets are compiled
once and executed repeatedly. If the compiled schema and the compiled
stylesheet are combined, there is considerable potential for efficiencies. I
think it would be a mistake to assume that performance impact will be a
show-stopper, or even that the net effect of introducing schemas will be
negative. It depends very much on the application.
> 
> As a consequence of this practical consideration, typing 
> offers little value for us.  We will continue to rely on Java 
> adaptors to wrap the results of our SQL statements in 
> well-formed XML 1.0 for presentation.  Business logic will 
> continue to live largely in SQL statements and Java classes 
> where typing is managed in their conventional ways.

You haven't really explained enough about your application architecture for
me to comment. Where does XSLT fit in?

Converting the same data between three different representations -
relations, objects, and XML - looks like the inefficient part of your
process to me. But I can't really judge.

It's worth pointing out that the type information in the data model used as
input to an XSLT transformation does not necessarily have to come from
schema validation. Implementors might well provide utilities that create a
type-annotated data model direct from a relational database. In this case,
the type information should be available for free, and there should
certainly be net benefits in making it available to the XSLT processor
rather than discarding it as happens currently.
> 
> The pfaffing of strings (love that word, Andrew) will 
> continue, but has never been a particular burden.  I wrote a 
> lovely universal calendar, porting a day-of-week algorithm 
> from C to XSLT and the sum of string pfaffing consisted of 
> three substring functions; a v2 of this stylesheet will still 
> use three functions to get what I need, but three discrete 
> functions on dateTime and a test and a cast in the absence of 
> validation.

Of course, XSLT 2.0 fully allows you to create things such as dates and
numbers by casting from strings. We don't see this as a "second-class" way
of working - it's an intrinsic feature of the language.
> 
> 2) A consequence of this reliance on validation-driven type 
> annotation is to effectively deprecate well-formedness in XML 
> processing.

Absolutely not. What gave you that impression?

> X(Path|SLT) 2.0 does not represent an 
> evolutionary step.  Developers and architects cannot simply 
> decide to use 2.0 because "it's the latest".  It is a 
> revolutionary change that implicates other choices.  It is a 
> paradigmatic shift, not a generational one and entails validation.

It's certainly a big step, one that greatly increases the range of
application of a language that has already proved highly successful in a
limited sphere. I think there are many features in both XSLT 2.0 and XPath
2.0 that will be highly attractive, reducing the difficulty of writing
complex transformations and extended the range of problems that can be
tackled. The emphasis in much discussion on the type system is unfortunate,
because many users will not be interested in subtleties of the type system,
and we have tried to design it so they don't need to be interested. A
stronger type system was necessary to achieve many of the other goals, but
it is not something that I regard as a benefit in its own right, unlike
user-defined functions, grouping, sequence manipulation, or regular
expression handling.

> 
> This is where political and philosophical considerations 
> enter.  I think that the drafts as currently constituted are 
> a death sentence for userspace XML. Whether you think that is 
> a problem is, as I say, a political issue.  Microsoft loves 
> the idea of obfuscating XML to the point of inutility and the 
> complexity of XSD is just one facet of their push to stub out 
> userspace XML.

I will refrain from speculation on this. Partly because I have no idea what
you mean by "userspace".
> 
> I feel strongly that XSL is currently a valuable userspace 
> tool.  This is at least partly a consequence of the relative 
> absence of datatyping.  The essential goal of XML, to my 
> mind, was to get data and process out of the binary silos, 
> out of the hands of ISVs and developers and into the hands of 
> users.  Userspace XML is a three-legged stool and the 
> application support and training legs have long been broken.  
> Deprecating well-formedness as the current drafts effectively 
> do leaves us sitting on the floor.
> 
> If xdt:untypedAtomic is the gesture intended to decouple type 
> annotation from validation, it does not go far enough.

If you could make concrete suggestions as to how it could go further, that
would be very helpful. It's very difficult for a Working Group to respond to
comments expressed in terms of broken three-legged stools.
> 
> The current drafts try to strike a balance but to no purpose. 
>  In the world projected by the May 2 drafts, there are 
> effectively two different species of XML, not two flavors; 
> the two cannot relate directly from the point-of-view of a 
> stylesheet author.  A stylesheet that behaves as expected 
> over one will not necessarily behave the same over the other, 
> including the possibility of run-time failure.  Validated XML 
> and that which is merely well-formed will each require 
> distinct XSL programming idioms.

Clearly the languages are trying to meet a range of different needs. If you
think the design can be improved, without reducing the range of requirements
we are trying to satisfy, then we will welcome concrete suggestions.
> 
> The drafts should be adapted to reflect this distinction.  I 
> feel strongly that, at the least, functions with unsafe 
> behavior in the absence of PSVI type annotation must be noted 
> in the specification.

Please - if you think the behavior is "unsafe" then you must tell us how. Of
course, the language is safer if you do data validation with a schema and
declare function signatures with strong typing, but even if you don't choose
to do this, the language is considerably safer than XSLT 1.0/XPath 1.0. You
obviously have some particular problem in mind here, please tell us what it
is.

> It would be better still to 
> accommodate the type annotation of XML data external to 
> validation.  Best of all would be to acknowledge that 
> type-annotated XML is a separate and distinct beast with its 
> own infoset; from an XSL perspective, it already is.
> 
Do remember that we are not just trying to handle "structured data" and
"unstructured data". One of the strengths of XML (and XML Schema) is its
recognition that real data is "semi structured". A data model that only
accommodated the two extremes would be seriously weaker than what we have
now.

Michael Kay
Received on Thursday, 15 May 2003 00:11:39 UTC