W3C home > Mailing lists > Public > xproc-dev@w3.org > December 2008

Re: with-option and other XPath Expressions - Prevent streaming from being possible (??)

From: David A. Lee <dlee@calldei.com>
Date: Fri, 5 Dec 2008 06:08:04 -0500
Message-ID: <CC03B951A6BF4DD5B45B0C549AA00500@calldei.com>
To: "Norman Walsh" <ndw@nwalsh.com>, "XProc Dev" <xproc-dev@w3.org>


>> I'm strongly opposed (on both technical and aesthetic grounds) to
>> adding a new bit of syntax as you propose above.

Agree, if there is a syntax for this why add new.

...
>I'm tempted to do a little bit of this myself. Just recognizing the
> expressions ^\$[A-Za-z_][A-Za-z0-9_]$, ^\'.*\'$, and ^\".*\"$ would
> probably cover 80% of the cases. (It wouldn't obviate the user from
> oviding the explicit binding, however.)

I thought about this for a while last week ... having the parser do static 
analysis on the xpath.
Regex's wont hack it.  For example there are some functions (many) that 
reference the context implicitly.  You might find a regex that 100% matches 
a constant declaration so that's a partial improvement but you wont find a 
regex that is 100% deterministic.

The only way I can think of to make this work is to have access to the guts 
of the actual XPath parser itself to spit pre-analyze the string and let you 
know if it access context or not.
I'm not aware of any where this is currently possible.
Although ... you could actually RUN the regex around a try/except block and 
see if you can find exceptions that are useful ... maybe.

Still I am still convinced this is a monumentous effort forced on the 
implementation for an extremely tiny potential benefit to the user.  And 
this one sticky point could be what makes it impossible (or so extremely 
difficult that it wont ever be done) to have xproc ever work in a streaming 
mode in the majority of cases.  And that's what I find sad, considering 
that's a primary intent of this whole spec.

So much so that I don't intend on supporting this part of the spec in my 
implementation,
even at the detriment of making the implementation non-conforming.
( perhaps as a special flag 
"-do-you-really-really-want-this -are-you-really-sure -type-yes" :)

Given that ... I do realize that true streaming in ANY XML operation is in 
practice extremely difficult to do and in theory sometimes impossible 
(depends on the actual operation, but the classic example is count() which 
requires reading in the whole document to output 1 result).

But XML technology is improving day by day slowly and adding more 'streaming 
like' features so that for example in Saxon SA XSLT some operations can be 
streamed currently without requiring to buffer the entire document.   Some 
of the XProc atomic operations COULD be streamed although most of them are 
difficult to do so due to limitations of the xpath matching which current 
(most) implementation libraries require reading the whole document to 
produce ... but if in the future say someone came up with a streaming XPath 
library where you could feed it say StAX events and out would come more StaX 
events with some marker that this one or that had matched the expression 
then operations like add-attribute could be implemented as streaming.  Not 
only streaming, but multi-threaded or multi-processor in a useful way.
( xmlsh, for example, implements pipeline steps/commands in multiple 
threads,
this 'feature' of the spec would make that a lot less usable, there are 
still operations
that could be done in multiple threads but not the typical case of a a|b|c|d 
pipeline with options)

So given that the pieces of Xproc (XML libraries) are slowly evolving bit by 
bit so the could be streamable to some extent I personally find it 
disappointing that the XML Pipeline Specification itself is built in a way 
to make it extremely difficult (or close to impossible) to actually make use 
of that ability in a vast amount of cases , where in my opinion the value to 
the user of the feature (of context based xpath expressions in options by 
default) is minimal.  Thus forcing the onus on the user to go out of their 
way to tell the implementation "I don't really need this feature that is the 
default mode but slows everything down".

In Conclusion

I'd like to cast a "Formal Vote" if there is such a thing, that the default 
case be reversed.
And that if no binding is given for with-option then the <empty/> binding is 
implied.  Not doing so, in my opinion, will have a dramatic negative effect 
on the real world usefulness of this spec and places a severe, unnecessary, 
cap on possible efficiency of implementations (without extraordinary 
efforts) for very limited potential benefit in syntax to users.



-David Lee
-----------------------------------------------------------
David A. Lee
Pres, DEI Services Inc.
VP Nexstra, Inc
Sr member of the technical staff, Epocrates, Inc.
dlee@calldei.com
dlee@nexstra.com
dlee@epocrates.com
http://www.calldei.com
http://www.xmlsh.org
http://www.nexstra.com
http://www.epocrates.com
Received on Friday, 5 December 2008 11:08:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 5 December 2008 11:08:48 GMT