Re: stitching together APIs from Annette Greiner on 2012-10-03 (public-hpcweb@w3.org from October 2012)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Wed, 3 Oct 2012 11:09:40 -0700
To: public-hpcweb@w3.org
Message-Id: <F9CB13D0-1800-42EB-8C57-8DBC482956B2@lbl.gov>
I tend to think that, despite the amount of site-specificity, there is some useful subset of calls that could be standardized across implementations. Much of the discussion in the last day has been about a specific possible implementation, which I think is a good thing to build, but we need to make sure we are clear on what we are trying to do with it. My thought all along has been that if we had a standard API it should be possible to build an app that uses one center's API and then switch it to run at another center with as little alteration as possible. It should also be easy to learn to use other centers' APIs and to build applications that invoke more than one center's API. I see building an implementation (or several) that implements the standard as a good proof that the standard works more than anything else. In fact, that is typically a part of the development of a W3C recommendation, though that's still pretty far down the road for us (and we may choose not to go that route at all). I wasn't really aiming for interoperability across sites within a single API implementation.
-Annette

On Oct 3, 2012, at 10:14 AM, Shreyas Cholia wrote:

> Sorry for jumping in late but I'm wondering if it might be worth taking a step back.
> 
> In my experience, HTTP APIs tend to be driven by the requirements and specific use cases driving them. In other words the form and semantics of particular API tend to reflect the underlying needs of the services they expose and the applications using them.
> 
> The flip side of this is that there has been very little standardization (deliberately IMO) in the W3C space in terms of what an HTTP API should look like. Moreover, most communities tend to shy away from defining a generalized HTTP API - there are too many details that end up being application specific. 
> 
> I'd like to pose a couple of questions
> 1. What is the purpose of a common HTTP API for HPC when many of the semantics are site and application specific? 
> 2. Is interoperability a realistic goal (CF "The Grid)? Is it enough that everyone just speaks HTTP?
> 
> I'm wondering if it might make more sense to work on a best practices type document. Something that says - Here are some things to consider when defining your own HTTP API, here are some sample APIs to use as a baseline. This might turn into a reference API that serves as a starting point for others.
> 
> This may be where we are headed anyway, in which case feel free to carry on.
> 
> -Shreyas
> 
> 
> 
> 
> 
> 
> 
> On Wed, Oct 3, 2012 at 8:09 AM, Rion Dooley <deardooley@gmail.com> wrote:
> 
> On Oct 2, 2012, at 9:06 PM, David Skinner <deskinner@lbl.gov> wrote:
> 
>> 
>> 
>> On Tue, Oct 2, 2012 at 1:13 PM, Rion Dooley <deardooley@gmail.com> wrote:
>> I agree with David's general point, though I'm wary of issues creeping up from different OS impacting the evaluation of the API itself. Perhaps using something like CloudFoundry or even a VM with a pretty vanilla stack would serve the same purpose and level the playing field some. Also, are we assuming the API is running on the HPC system or as a hosted service?
>> 
>> 
>> Sure. There is something I like about the simplicity of the tarball, configure, make, run all being local but also see your points about client support. I think it's safe to say that the target platform is ultimately a Linux node sitting close the batch and filesystems of a big Linux machine. That's something you can reliable mimic on say an apple laptop or a server node. Am I missing other platforms that we'd want to support easy test drives on?
>> 
>> As for hosting, so far the assumption has mostly been that it runs at the center. Some places like LLNL would have big issues with hosting-out access to the big iron. I'd let them speak to that but that's my guess.. BTW, GlobusOnline has a hosted approach for data movement, an associated REST API, etc. My feeling is that for execution and job control the HPC system based approach has some heavy upsides. What do others think?
> 
> I see your point about LLNL and other high security situations, but it's fair to consider the many centers where admins simply won't allow services like these to run on their head nodes. Also, are we assuming it runs as root, or in user space?
> 
>> 
>> Alternatively we (NERSC, TACC, etc.) could be the cloud test/dev space for the API. Sufficiently stubbed out it would be hard for people to make trouble with and it would be a zero-step install. Think like those CMS demo pages that let you login to test drive. Anything that makes it easier for people to get a taste of HPC on the web I am all for considering. I suggest that if we go with Cloudfoundry, NERSC/TACC, or whatever that we step through what we're asking newcomers to do in order to try it out. This will appeal both the application/user folks who are interested in HPC web interface options as well as facilties/center people who are going to evaluate whether they can live with it. 
> 
> One of the benefits of running on cloud foundry or any other PaaS would be that the API services could be deployed to any "cloud" as well as the desktop. 
> 
>> 
>> One of the icebergs that sunk the grid ship was difficultly in getting software and services up and running. Highly layered middleware, hard installs, pages of XML configs, etc. I am against all that stuff. 
> 
> Amen. So at the very least we'll need a clean web app with a simple wizard/form to configure the scheduler, account types, auth mechanism, default data protocol and file system, admin accounts, and some performance preferences. What else have i missed?
> 
>> 
>> As for names, NERSC Web Toolkit has no great value attached to it. We discussed with LLNL and others that it could be the Nice and Easy Web Toolkit, or whatever else. Again if the name can be changed to make the API non-territorial in order to increase adoption I am all for it. I have no idea what it means but I think Globus was a good choice. Big. Round. Inclusive. 
>> 
>> The auth, data, jobs, and metadata services seem to be a good starting place. We might also want some information services such as system discovery and monitoring. Given that this is meant to drive web apps…and hopefully future ones, perhaps supporting event and pub/sub services would also be helpful. Lastly, is the api in charge of monitoring itself or are we assuming that's a production detail the centers would implement themselves? One of the things we've done with AGAVE is provide both real time and historical uptime reports for our users. This service is deployed outside the api, and lets us know the ongoing stability of our services and the systems and services we depend on. We find that it also helps build trust with our users. I'm not sure that this service is really in the scope of the API, but it's one of those things that, until we had it, we always missed, but never knew it. What are other people's thoughts on this?
>> 
>> 
>> I'd go for monitoring as a core topic. Job status already is (GET on jobid) as are queue monitoring (GET on system). Monitoring is something my group does a lot of (app perf, FS perf, power/env monitoring, etc.) so I know that scope creep is very possible here. What about these topics?
>> 
> 
> I suppose this depends on what the purpose of the api is. If it's a target for developers to build apps that end users to access the resources and conduct science from the web, then providing sysadmin tools might be overkill. I'm not sure our next chemistry gateway would see much value from having access to power consumption stats on each job. It would make for some great visualizations, though. The question seems to be who is our target audience.
> 
> 
>> system monitoring:  uptime , core count, #people logged in, date of deployment, pub/sub on outages to steer workflow automation (back off when outage is announced) 
>> 
> 
> This all seems like good info. Do we envision supporting single or multi-tenancy? If running on the HPC system, do we foresee API sessions being tied to system sessions?
> 
> 
>> self monitoring : introspection on the number of sessions recent API activity, could be admin only. 
> 
>> 
>> FS monitoring : df for the web 
>> 
>> data transfer monitoring : maybe this is GO's territory not ours?
> 
> There are some significant auth challenges to doing this in the general case and if we want to ship something as a deployed solution, we need to support sftp, ftp, fops, gridftp, and irods out of the box so users can access data how they see fit. 
> 
>> 
>> ...one more topic...
>> 
>> At the risk of bloating the set of tasks ahead I am leaning towards the notion that task queues may also be a core concept. That gets us into the wild and woolly space of workflows, but relying on the HPC batch queue system to delineate a set of steps to be done is failing IMO, at our site at least. They don't scale and their latency is too high.. There are big wins for providing assistance to science teams who have 10^6 "things" they need to do "M at a time" and currently have no great solutions except writing their own control loops. So while i see a pressing need there, I am not 100% that NEWT/AGAVE etc. is the right place for it. 
>>  
> 
> Were you thinking of pulling from an existing workflow project to implement this? Do you have a preference? AGAVE is all queue driven on the backend, so if you wanted to reuse the preprocessing mechanism from our IO and Data services that chain together a series of data transforms to process data as it passes in and out of our Data Store, I'm happy to done the code. While I don't think it's exactly what you're describing, it gets us part of the way there and it's built off the Quartz framework, so the technology is well established. It might be a starting point. Another option might be looking at Airavata, Apache ODE, Spring, jBPM, etc given they already have well-defined mechanisms for doing this. 
> 
> What kind of interoperability are we targeting with these services? Are we striving for standard near-compliance (given that full probably isn't reasonable), or usability? 
> 
>> Let's keep the conversation going. I am available to chat pretty much anytime at 510-486-4748 if you have ideas or what I said here is unclear. 
>> 
>> Cheers, 
>> 
>> David
>> 
>> https://foundation.iplantcollaborative.org/monitor/history
>> 
>> --
>> Rion
>> @deardooley
>> 
>> On Oct 2, 2012, at 1:24 PM, David Skinner <deskinner@lbl.gov> wrote:
>> 
>>> 
>>> 
>>> On Tue, Oct 2, 2012 at 11:11 AM, Annette Greiner <amgreiner@lbl.gov> wrote:
>>> Hi folks,
>>> To frame the discussion for the October 11 conference call, I've started thinking about how to go about putting together a first draft of a standard API. It seems to me that it would be logical to simply blend the two APIs we currently have, NEWT and the iPlant API (Agave). There's a lot they have in common, though of course they have different terms for things. I would suggest we choose our terms based on three principles:
>>> coherence: terms in the API should have grammatical commonality with other terms of similar function in the API
>>> clarity: terms should be unambiguous
>>> memorability: terms should be easy to associate mentally with their meaning in the API
>>> cross-center generalizability: terms should make sense in the context of any HPC center
>>> 
>>> 
>>> Good points. One step toward the last one is to make a fake HPC center stubbed out in the software itself. This serves two purposes. 1) you get to try the software or develop on your laptop without touching the guts of your HPC center. 2) It provides a common meeting ground for all of us as a plain vanilla idealization of an HPC center. To be a little more specific I am suggesting that auth, data, and job functions should have stub implementations that operate locally and while ineffectual they should be processed in a way that mimics a real HPC center. 
>>> 
>>> auth: just use an install-time configured password with a test user
>>> data: just move local files on disk 
>>> jobs: just run the command (fork/exec). 
>>> KVP store: use a couch or mongo local instance. 
>>> 
>>> Once we have that stub implementation down and packaged people can download and try the API without herculean efforts. 
>>> 
>>> We'll also need to discuss the scope of the standard API. How much should it cover? Clearly, centers should be free to do their own implementations; we are just defining a set of REST calls that can be re-used across implementations. But what functions should be left out of the standard? I'm thinking here of functions that are not specific to HPC. One example is the iPlant PostIt, which generates disposable URLs. I think that's a great service to offer people, but I would suggest we leave it out of a standard for HPC, since it isn't a function that arises from the HPC context. The iPlant Apps and Profile features strike me similarly. NEWT has a liststore feature that could also be seen as a non-HPC aspect of that API.
>>> 
>>> 
>>> The guiding model for NEWT thus far has been to stick to the core things you see in HPC center documentation. How do I log in, how do I move files, how do I run things. We don't need to be rigid about that but having a guiding principle with a decent level of simplicity seems prudent. 
>>>  
>>> We've also advocated an exception mechanism whereby you can step outside the API and do whatever you like. That provides some demarcation as to where the API stops and where custom machinery begins. 
>>> 
>>> -David
>>> 
>>> What do other people think? How should we define what is in/out of the spec?
>>> -Annette
>>> --
>>> Annette Greiner
>>> Outreach, Software, and Programming Group
>>> NERSC, LBNL
>>> amgreiner@lbl.gov
>>> 510-495-2935
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 

--
Annette Greiner
Outreach, Software, and Programming Group
NERSC, LBNL
amgreiner@lbl.gov
510-495-2935
Received on Wednesday, 3 October 2012 18:10:16 UTC