RE: Loop over URLs with http-request...how? from Philip Fennell on 2010-11-10 (xproc-dev@w3.org from November 2010)

From: Philip Fennell <Philip.Fennell@marklogic.com>
Date: Wed, 10 Nov 2010 13:03:47 -0800
To: XProc Dev <xproc-dev@w3.org>
Message-ID: <D20C296D14127D4EBD176AD949D8A75A46F5C514@EXCHG-BE.marklogic.com>
Tony,

I've tried running the pipeline I sent you (even copy 'n' pasting it from the e-mail) in oXygen which, according to the website is using Calabash 0.9.23, and it worked fine for me on the first link, and on all the others. It took about four minutes and gathered 29664 lines of mark-up. Why it didn't work for you I don't know. Can you remember what the errors were?

> I would love your insight as to how to write this sort of thing more efficiently.

There's nothing inherently wrong with your version. You've chosen to 'transform' the link element into a c:request, added the missing attributes and removed the title. I created a new element and added the an href attribute to it with the URI from the link.

'Same drink, different bottle'.

I don't think you need the 'local' namespace, the empty namespace will do fine.

The only simplification you could use is the newly suggested p:document-template:

<http://norman.walsh.name/2010/11/07/xproc-document-templates>

but that's non-standard at present.

> Personally, XProc is delightfully much higher-level than XSL, but sometimes
> I feel like I get caught on the dumbest, tiniest details and get delayed forever
> on what I thought would be a trivial task.  I guess that comes with the territory
> of choosing a language in its infancy as my weapon of choice.  :)

No, I think the issue is that XProc requires a different way of looking at the problem your trying to solve. (IMHO) It's not because it's in its infancy, it's because it is not like other languages.

> XProc really needs to start building some libraries akin to Python's packages,
> PHP's PEAR modules, or Ruby's Gems.  Something central, highly visible, and
> useful for solving the drudgery and gruntwork coding tasks.

There's nothing stopping people doing that, but these things tend to take time. I've created libraries of steps for doing a variety of things including:

* Recursive descent of directory trees
* Loading content into MarkLogic
* Configuring MarkLogic (A DSL built from XProc steps)
* Deploying code to MarkLogic
* Atom Pub loading/testing

Getting around to publishing them somewhere tends to be the last thing on one's mind. But, saying that, it somewhat shames me into thinking that I should do that! I think someone, previously, on this list suggested such a site or place or something like that. I'll have to search the archive and see if I can find it. Failing that them one of the usual public project sites like google code, git hub, sourceforge or the like would do.


> Thanks again SO much for your help.  You have saved me many hours of stress
> and probably a letter grade or two for this project.

Delighted :)


Regards

Philip

From: Tony Rogers [mailto:tony@gonk.net]
Sent: 10 November, 2010 7:57 AM
To: Philip Fennell
Subject: Re: Loop over URLs with http-request...how?


On Nov 10, 2010, at 2:25 AM, Philip Fennell wrote:


Tony,

I'll be without internet access for most of this morning.
Morning?!  It's past midnight here!  I'm only reading this because of insomnia.  :)

(I'm in the Eastern United States.  And I didn't vote for Bush, so please don't hate me.  0;-)


I'll have a look again this afternoon.
I would love your insight as to how to write this sort of thing more efficiently.

Personally, XProc is delightfully much higher-level than XSL, but sometimes I feel like I get caught on the dumbest, tiniest details and get delayed forever on what I thought would be a trivial task.  I guess that comes with the territory of choosing a language in its infancy as my weapon of choice.  :)

XProc really needs to start building some libraries akin to Python's packages, PHP's PEAR modules, or Ruby's Gems.  Something central, highly visible, and useful for solving the drudgery and gruntwork coding tasks.

<rant />.



However, looking at your example, I can see the error message is correct. The way you construct the c:request won't work because the p:with-option is inside the p:inline and therfore won't be evaluated.
OOOHHHHHHhh, so THAT'S what was going on!

...Oh.  Yeah.  That makes perfect sense.  *doh!*

Huge help, that.  Thanks!



You do need to construct the c:request, as I did, before the p:http-request and use the p:addiattribute step to add the href attribute.

When I said I'd tested it on the first link I found my pipeline was able to retrieve that page. However, when I ran it against all the links it was taking a very long time to finish.

Are you sure that all the pages can be retrieved. Have a look at the Calabash extensions. There's a timeout attribute for http-request. Try that and see if the pipeline completes.

Dunno if you got my later email, but I eventually got the pipeline working, although it took some very ugly hackery.  I hate resorting to ugly code because it makes me never want to come back to it when I'm done, but I suppose making the class deadline is worth it.  For now.  Ha.

Well that ends my stream-of-consciousness insomniac email.  Thanks again SO much for your help.  You have saved me many hours of stress and probably a letter grade or two for this project.  If I had money, I'd send you enough to go buy a drink.  :)

Nite!

-Tony
Received on Wednesday, 10 November 2010 21:04:14 UTC