Re: stitching together APIs from Rion Dooley on 2012-10-03 (public-hpcweb@w3.org from October 2012)

From: Rion Dooley <deardooley@gmail.com>
Date: Wed, 3 Oct 2012 10:09:17 -0500
To: David Skinner <deskinner@lbl.gov>
Cc: "public-hpcweb@w3.org" <public-hpcweb@w3.org>
Message-Id: <487A13E3-82A7-48E6-BE3E-51C5E39915BC@gmail.com>
On Oct 2, 2012, at 9:06 PM, David Skinner <deskinner@lbl.gov> wrote:

> 
> 
> On Tue, Oct 2, 2012 at 1:13 PM, Rion Dooley <deardooley@gmail.com> wrote:
> I agree with David's general point, though I'm wary of issues creeping up from different OS impacting the evaluation of the API itself. Perhaps using something like CloudFoundry or even a VM with a pretty vanilla stack would serve the same purpose and level the playing field some. Also, are we assuming the API is running on the HPC system or as a hosted service?
> 
> 
> Sure. There is something I like about the simplicity of the tarball, configure, make, run all being local but also see your points about client support. I think it's safe to say that the target platform is ultimately a Linux node sitting close the batch and filesystems of a big Linux machine. That's something you can reliable mimic on say an apple laptop or a server node. Am I missing other platforms that we'd want to support easy test drives on?
> 
> As for hosting, so far the assumption has mostly been that it runs at the center. Some places like LLNL would have big issues with hosting-out access to the big iron. I'd let them speak to that but that's my guess. BTW, GlobusOnline has a hosted approach for data movement, an associated REST API, etc. My feeling is that for execution and job control the HPC system based approach has some heavy upsides. What do others think?

I see your point about LLNL and other high security situations, but it's fair to consider the many centers where admins simply won't allow services like these to run on their head nodes. Also, are we assuming it runs as root, or in user space?

> 
> Alternatively we (NERSC, TACC, etc.) could be the cloud test/dev space for the API. Sufficiently stubbed out it would be hard for people to make trouble with and it would be a zero-step install. Think like those CMS demo pages that let you login to test drive. Anything that makes it easier for people to get a taste of HPC on the web I am all for considering. I suggest that if we go with Cloudfoundry, NERSC/TACC, or whatever that we step through what we're asking newcomers to do in order to try it out. This will appeal both the application/user folks who are interested in HPC web interface options as well as facilties/center people who are going to evaluate whether they can live with it. 

One of the benefits of running on cloud foundry or any other PaaS would be that the API services could be deployed to any "cloud" as well as the desktop. 

> 
> One of the icebergs that sunk the grid ship was difficultly in getting software and services up and running. Highly layered middleware, hard installs, pages of XML configs, etc. I am against all that stuff. 

Amen. So at the very least we'll need a clean web app with a simple wizard/form to configure the scheduler, account types, auth mechanism, default data protocol and file system, admin accounts, and some performance preferences. What else have i missed?

> 
> As for names, NERSC Web Toolkit has no great value attached to it. We discussed with LLNL and others that it could be the Nice and Easy Web Toolkit, or whatever else. Again if the name can be changed to make the API non-territorial in order to increase adoption I am all for it. I have no idea what it means but I think Globus was a good choice. Big. Round. Inclusive. 
> 
> The auth, data, jobs, and metadata services seem to be a good starting place. We might also want some information services such as system discovery and monitoring. Given that this is meant to drive web apps…and hopefully future ones, perhaps supporting event and pub/sub services would also be helpful. Lastly, is the api in charge of monitoring itself or are we assuming that's a production detail the centers would implement themselves? One of the things we've done with AGAVE is provide both real time and historical uptime reports for our users. This service is deployed outside the api, and lets us know the ongoing stability of our services and the systems and services we depend on. We find that it also helps build trust with our users. I'm not sure that this service is really in the scope of the API, but it's one of those things that, until we had it, we always missed, but never knew it. What are other people's thoughts on this?
> 
> 
> I'd go for monitoring as a core topic. Job status already is (GET on jobid) as are queue monitoring (GET on system). Monitoring is something my group does a lot of (app perf, FS perf, power/env monitoring, etc.) so I know that scope creep is very possible here. What about these topics?
> 

I suppose this depends on what the purpose of the api is. If it's a target for developers to build apps that end users to access the resources and conduct science from the web, then providing sysadmin tools might be overkill. I'm not sure our next chemistry gateway would see much value from having access to power consumption stats on each job. It would make for some great visualizations, though. The question seems to be who is our target audience.


> system monitoring:  uptime , core count, #people logged in, date of deployment, pub/sub on outages to steer workflow automation (back off when outage is announced) 
> 

This all seems like good info. Do we envision supporting single or multi-tenancy? If running on the HPC system, do we foresee API sessions being tied to system sessions?


> self monitoring : introspection on the number of sessions recent API activity, could be admin only. 

> 
> FS monitoring : df for the web 
> 
> data transfer monitoring : maybe this is GO's territory not ours?

There are some significant auth challenges to doing this in the general case and if we want to ship something as a deployed solution, we need to support sftp, ftp, fops, gridftp, and irods out of the box so users can access data how they see fit. 

> 
> ...one more topic...
> 
> At the risk of bloating the set of tasks ahead I am leaning towards the notion that task queues may also be a core concept. That gets us into the wild and woolly space of workflows, but relying on the HPC batch queue system to delineate a set of steps to be done is failing IMO, at our site at least. They don't scale and their latency is too high. There are big wins for providing assistance to science teams who have 10^6 "things" they need to do "M at a time" and currently have no great solutions except writing their own control loops. So while i see a pressing need there, I am not 100% that NEWT/AGAVE etc. is the right place for it. 
>  

Were you thinking of pulling from an existing workflow project to implement this? Do you have a preference? AGAVE is all queue driven on the backend, so if you wanted to reuse the preprocessing mechanism from our IO and Data services that chain together a series of data transforms to process data as it passes in and out of our Data Store, I'm happy to done the code. While I don't think it's exactly what you're describing, it gets us part of the way there and it's built off the Quartz framework, so the technology is well established. It might be a starting point. Another option might be looking at Airavata, Apache ODE, Spring, jBPM, etc given they already have well-defined mechanisms for doing this. 

What kind of interoperability are we targeting with these services? Are we striving for standard near-compliance (given that full probably isn't reasonable), or usability? 

> Let's keep the conversation going. I am available to chat pretty much anytime at 510-486-4748 if you have ideas or what I said here is unclear. 
> 
> Cheers, 
> 
> David
> 
> https://foundation.iplantcollaborative.org/monitor/history
> 
> --
> Rion
> @deardooley
> 
> On Oct 2, 2012, at 1:24 PM, David Skinner <deskinner@lbl.gov> wrote:
> 
>> 
>> 
>> On Tue, Oct 2, 2012 at 11:11 AM, Annette Greiner <amgreiner@lbl.gov> wrote:
>> Hi folks,
>> To frame the discussion for the October 11 conference call, I've started thinking about how to go about putting together a first draft of a standard API. It seems to me that it would be logical to simply blend the two APIs we currently have, NEWT and the iPlant API (Agave). There's a lot they have in common, though of course they have different terms for things. I would suggest we choose our terms based on three principles:
>> coherence: terms in the API should have grammatical commonality with other terms of similar function in the API
>> clarity: terms should be unambiguous
>> memorability: terms should be easy to associate mentally with their meaning in the API
>> cross-center generalizability: terms should make sense in the context of any HPC center
>> 
>> 
>> Good points. One step toward the last one is to make a fake HPC center stubbed out in the software itself. This serves two purposes. 1) you get to try the software or develop on your laptop without touching the guts of your HPC center. 2) It provides a common meeting ground for all of us as a plain vanilla idealization of an HPC center. To be a little more specific I am suggesting that auth, data, and job functions should have stub implementations that operate locally and while ineffectual they should be processed in a way that mimics a real HPC center. 
>> 
>> auth: just use an install-time configured password with a test user
>> data: just move local files on disk 
>> jobs: just run the command (fork/exec). 
>> KVP store: use a couch or mongo local instance. 
>> 
>> Once we have that stub implementation down and packaged people can download and try the API without herculean efforts. 
>> 
>> We'll also need to discuss the scope of the standard API. How much should it cover? Clearly, centers should be free to do their own implementations; we are just defining a set of REST calls that can be re-used across implementations. But what functions should be left out of the standard? I'm thinking here of functions that are not specific to HPC. One example is the iPlant PostIt, which generates disposable URLs. I think that's a great service to offer people, but I would suggest we leave it out of a standard for HPC, since it isn't a function that arises from the HPC context. The iPlant Apps and Profile features strike me similarly. NEWT has a liststore feature that could also be seen as a non-HPC aspect of that API.
>> 
>> 
>> The guiding model for NEWT thus far has been to stick to the core things you see in HPC center documentation. How do I log in, how do I move files, how do I run things. We don't need to be rigid about that but having a guiding principle with a decent level of simplicity seems prudent. 
>>  
>> We've also advocated an exception mechanism whereby you can step outside the API and do whatever you like. That provides some demarcation as to where the API stops and where custom machinery begins. 
>> 
>> -David
>> 
>> What do other people think? How should we define what is in/out of the spec?
>> -Annette
>> --
>> Annette Greiner
>> Outreach, Software, and Programming Group
>> NERSC, LBNL
>> amgreiner@lbl.gov
>> 510-495-2935
>> 
>> 
>> 
>> 
>> 
>> 
> 
>
Received on Wednesday, 3 October 2012 15:14:02 UTC