Re: XProc Usability (was Re: New to Xproc Question : conditionnal "output port" definition?) from Norman Walsh on 2011-10-12 (xproc-dev@w3.org from October 2011)

From: Norman Walsh <ndw@nwalsh.com>
Date: Wed, 12 Oct 2011 15:57:24 -0400
To: XProc Dev <xproc-dev@w3.org>
Message-ID: <m262ju3rvf.fsf@nwalsh.com>
Zearin <zearin@gonk.net> writes:
> Geert’s earlier reply was a good start.  I’ll start with that:
>
>     On Oct 12, 2011, at 10:48 AM, Geert Josten wrote:
>
>         Just jumping in and out in the middle, regarding compactness..
>
>         It would already help a lot if the specs could be tuned such that you can rely more on default behavior,
>         or could write things with less characters:
>
>     Yes.

Ok.

>     Having to use p:with-option is so unbelievably annoying!  It’s annoying to write.  It’s annoying to read.
[...]
>     …But then, there’s <p:with-option />.  An extra child element, plus attributes, plus the potential for still
>     more child elements of its own…just because a step’s argument is evaluated at runtime?  Come on!
>
>     Sigh.
>
>     I know there was a reason behind this.  I know it exists for the implementors.

I don't think ease of implementation played that big a role in our
decisions. There's a rhythm to the way a working group operates: an
early period of exploration and brainstorming, a longer period of
feature selection and decision making, and then at some point you
realize that you need to declare victory and ship. This is in many
ways the same dillema that software developers have, there's always
one more bug, one more feature that could go in the next release. At
some point you have to shoot the engineer and ship it.

Some things we decided early, and probably with insufficient care,
when there were much, much bigger problems on the group's plate.
Insisting that options and parameters must be strings is definitely in
that category, AVTs might be in there as well, but I don't recall with
certainty.

Much later in the development cycle, you realize that maybe one of
those early decisions was wrong. But now you're in the "we need to
ship phase" and it's really, really hard to get consensus to reopen
something that feels fundamental.

I tried with the "options as strings" decision because I became
convinced it was a mistake. I even implemented it to demonstrate that
it wasn't an implementation challenge. Nevertheless, I couldn't
persuade the WG to reopen that issue.

I'm not defending the behavior of working groups, just reporting the
facts. I *will* say that reopening issues late can (and sometimes
should) have profound consequences. The sort that can take weeks or
months to work out. So you're fighting two problems: the first being
that no one will ever use your spec if it's never finished and the
second being exhaustion among the WG participants.

As an example of what happens when you don't accept the profound
circumstances, I think the XProc spec is a bit slapdash about
c:body/p:data and the whole story about dealing with non-XML data. Our
opening position was: "no, we're not going there". But near the end of
the process, it became clear that that was just not going to be
acceptable. So we patched the holes we could see, but we never went
back and reevaluated all of the decisions we'd made to see if the
result was a cohesive, whole. It would have taken weeks or months,
might have turned up new issues that also would have taken weeks or
months, etc.

[...]
>     Would it have caused World War Ⅲ to do this?
>
>         <p:input port="source" empty="true" />

No, but it would have been an ad hoc syntactic shortcut. Something end
users would have to remember as an exception case, a wart in the
design. We tried to avoid those.

>     Or this?
>
>         <p:input port="source" select="empty()" />

Actually, that almost works.

         <p:input port="source" select="()" />

does select an empty sequence and that will make the source empty. But
it still needs to have a binding so it doesn't really do exactly what
you want.

Where we currently say it's an error to have an unbound input port, I
think perhaps we should instead say that an unbound input port binds
by default to an empty sequence of documents. That'll generate runtime
errors instead of "compile time" ones in cases where the step in
question doesn't accept a sequence of documents, but for some common
cases, it'll result in less typing.

The question is: is the extra confusion caused by delaying that error
until runtime a net win for users, or a net loss?

>     I’m so sorry, Norm.
>
>     It’s just…XProc makes me angry.

Nothing to be sorry about. I have a thick skin. Naturally, I want to
see XProc improved and I want to see it gain broader adoption. I don't
have (too much) of an ego about the decisions we made. I think your
belief that we were motivated to make implementation easy is mistaken,
but since I'm an implementor, maybe I'm biased.

It's easy to play "what if" games. Pick any decision the WG made and
speculate about what would be different about the language if we'd
made a different decision there and propagated the consequences of
that decision throughout the language. Would the result have been
better or worse? Would the WG ever have finished?

>     (Please don’t hurt me!  ☹  I feel like I have just betrayed a childhood hero…I just never imagined something
>     I looked forward to so much would turn out to be this frustrating to use.)

That's valuable feedback, even if it stings.

> But off the top of my head, some things are:
>
>   • Processing <p:http-request /> results is hard
>       □ This isn’t XProc’s fault—just an unfortunate reality of dealing with tag-soup HTML.  (And Tidy is useless
>         if you want X/HTML5. ☹)

I've moved to Henri Sivonen's HTML5 parser, so that's fixed. And with
http://xproc.org/library/#http-get there's a step you can import that makes
http-request for GET much simpler:

  <l:http-get href="someURI"/>

it returns the results, passing them through unescape-markup if necessary.

>   • Doing anything with the filesystem in XProc is an exercise in masochism.  Having to reconnect an input in the
>     middle of a pipeline whenever I use a <p:store /> makes me want to break things.

You can put all the p:store's at the end, you know. But you still have
to wire them up, I guess. I can easily write a library step to do identity+store.

>       □ There have been several times I actually preferred run the pipeline, waiting for Oxygen’s “results” view,
>         and copy-paste the results into a new document.

I don't understand that one.

>       □ The EXProc steps admittedly help a bit.  Why they weren’t included in the core spec is beyond me.

Because there are an endless number of steps that could have been
included and we'd never finish if we didn't draw the line somewhere.

They're in exproc.org now. Would you be happier if they were published
as WG notes to make them more official?

>       □ On the other hand, I kinda wish the EXProc filesystem steps were also available as functions.  That way I
>         could just use them as functions in p:inputs and p:outputs (or use them with p:options in my own steps).

How would you use delete() or head() as functions?

>   • XProc does not lend itself well to “learning by exploration”.  (This is XMLSH’s greatest strength, in my
>     opinion.)  In XProc, I spend too much time trying to understand errors, figuring out how to work around stuff
>     like <p:store />, looking up what steps and options do, and so on.  Before long I’ve forgotten what I started
>     the pipeline for, and instead I’m trying to remember all these other things in order to get it to work.

Sigh.

[...]
> Actually, the compact syntax makes some of the required-verbosity a bit less frustrating.  (But I do mean a bit
> less.  I’d rather be able to just pass in $variable wherever I needed it.)

Jeni's syntax is much, much more like that.

>     And I'm
>     one of the outliers who's unconvinced that moving from an XML
>     pattern-based syntax to a string syntax for XPath (back way before
>     XSLT 1.0 came out) was a good thing. Water under the bridge.
>
> Oh, wow!  The thought of that gives me shudders…
>
> I’m curious—why do you feel this way?

Because a lot of valuable information is locked up in a non-XML syntax
where I can't get at it with my XML tools. And also because XPath is
what introduced QNames-in-content to the world, for which sin I still
feel unclean.

> I don’t have 1337 programming skills, Norm. But I’ve offered to help
> with some grunt-work on your websites/ documentation before. I
> figured you were too busy to respond, but if you reading this, the
> offer’s still open. (E-mail me off-list if you’re interested.)

Sorry. I'm sure what happened is I filed that under "great!" but it
drifted out of my consciousness before I could work out how to deal
with the coordination effort.

Sometimes I think I should start trying to "do less" and "lead more",
but that doesn't play to my strengths, alas.

> I enjoyed your talk from XML Prague 2011. If you decide to discuss
> this for 2012, I’d be very interested.

I'll be there to discuss *something* I'm sure. :-)

>            Is there any official work (as in W3C) on improving XProc at the moment?
>
>     No. The XProc WG is not chartered to do new work. We're going to talk
>     informally about next steps at the upcoming f2f in CA. It's a bit of a
>     chicken and egg problem. Trying to get the W3C Membership and our
>     member companies to support work on V.next would be aided by greater
>     adoption of V1.0.
>
> What a Catch-22!  If I want XProc to get better, I have to use the version that makes me want to cry.

Well. There's always the possibility that some rogue implementor will begin experimenting
with the features that would make the language easier to use, even though they're not
officially part of the standard.

                                        Be seeing you,
                                          norm

--
Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 413 624 6676
www.marklogic.com
Received on Wednesday, 12 October 2011 19:58:01 UTC