Re: stitching together APIs

On Oct 3, 2012, at 10:14 AM, Shreyas Cholia <scholia@lbl.gov> wrote:

Sorry for jumping in late but I'm wondering if it might be worth taking a
step back.

In my experience, HTTP APIs tend to be driven by the requirements and
specific use cases driving them. In other words the form and semantics of
particular API tend to reflect the underlying needs of the services they
expose and the applications using them.

The flip side of this is that there has been very little standardization
(deliberately IMO) in the W3C space in terms of what an HTTP API should
look like. Moreover, most communities tend to shy away from defining a
generalized HTTP API - there are too many details that end up being
application specific.

I'd like to pose a couple of questions
1. What is the purpose of a common HTTP API for HPC when many of the
semantics are site and application specific?


And yet if you look at the user documentation for most HPC centers it's all
the same. How do I login, get/put data, start stop jobs. Compilation is
missing from our discussion wrt web but I am ok with that. Likewise we can
scoot over the application particulars by treating them as generic boxes
with input and output. This puts the onus on the app folks for sure but
 that's who really understands what switches and knobs the app has. I think
web based stdio/stderr, file access, kvp, and web methods on files are the
best we can do and stay simple.

Likewise site specifics can be distilled to a few important
get,put,post,delete operations on things like machines, queues, files and
users. There we put the onus on site operators to map an implementation to
their local specifics. For files this is straightforward. Mostly for users
too (auth, whoami). For jobs and tasks it can get trickier since there are
batch scripts w/ their own syntax and also cmdline tasks. The latter is not
so hard.

The former, at least for NERSC, is hard IMO. The grid vision of write a
workflow once and run it anywhere (in and between centers) is one I am
willing to throw under the bus if it means we can have a simple web API and
people have to adapt between PBS/SGE and adjust queue settings in their app
or web submission form.

"But David" you say "don't you want to make it easy and seamless for
everyone?" To which I'd answer, yes I do but if the cost is the lodestone
of resource discovery and new and unfamiliar ways of expressing batch jobs,
then that is not easy and sets a course toward the grid destination of an
in principal fully general computing fabric which does not make it into
practice. The queue names and the "-l mppblah" settings are always going to
be site idiosyncratic. If we shoot too high in terms of seamless everywhere
operation I hunk we'll miss the target. My $0.02.


2. Is interoperability a realistic goal (CF "The Grid)? Is it enough that
everyone just speaks HTTP?


Not sure.

-David

I'm wondering if it might make more sense to work on a best practices type
document. Something that says - Here are some things to consider when
defining your own HTTP API, here are some sample APIs to use as a baseline.
This might turn into a reference API that serves as a starting point for
others.

This may be where we are headed anyway, in which case feel free to carry on.

-Shreyas







On Wed, Oct 3, 2012 at 8:09 AM, Rion Dooley <deardooley@gmail.com> wrote:

>
> On Oct 2, 2012, at 9:06 PM, David Skinner <deskinner@lbl.gov> wrote:
>
>
>
> On Tue, Oct 2, 2012 at 1:13 PM, Rion Dooley <deardooley@gmail.com> wrote:
>
>> I agree with David's general point, though I'm wary of issues creeping up
>> from different OS impacting the evaluation of the API itself. Perhaps using
>> something like CloudFoundry or even a VM with a pretty vanilla stack would
>> serve the same purpose and level the playing field some. Also, are we
>> assuming the API is running on the HPC system or as a hosted service?
>>
>>
> Sure. There is something I like about the simplicity of the tarball,
> configure, make, run all being local but also see your points about client
> support. I think it's safe to say that the target platform is ultimately a
> Linux node sitting close the batch and filesystems of a big Linux machine.
> That's something you can reliable mimic on say an apple laptop or a server
> node. Am I missing other platforms that we'd want to support easy test
> drives on?
>
> As for hosting, so far the assumption has mostly been that it runs at the
> center. Some places like LLNL would have big issues with hosting-out access
> to the big iron. I'd let them speak to that but that's my guess. BTW,
> GlobusOnline has a hosted approach for data movement, an associated REST
> API, etc. My feeling is that for execution and job control the HPC system
> based approach has some heavy upsides. What do others think?
>
>
> I see your point about LLNL and other high security situations, but it's
> fair to consider the many centers where admins simply won't allow services
> like these to run on their head nodes. Also, are we assuming it runs as
> root, or in user space?
>
>
> Alternatively we (NERSC, TACC, etc.) could be the cloud test/dev space for
> the API. Sufficiently stubbed out it would be hard for people to make
> trouble with and it would be a zero-step install. Think like those CMS demo
> pages that let you login to test drive. Anything that makes it easier for
> people to get a taste of HPC on the web I am all for considering. I suggest
> that if we go with Cloudfoundry, NERSC/TACC, or whatever that we step
> through what we're asking newcomers to do in order to try it out. This will
> appeal both the application/user folks who are interested in HPC web
> interface options as well as facilties/center people who are going to
> evaluate whether they can live with it.
>
>
> One of the benefits of running on cloud foundry or any other PaaS would be
> that the API services could be deployed to any "cloud" as well as the
> desktop.
>
>
> One of the icebergs that sunk the grid ship was difficultly in getting
> software and services up and running. Highly layered middleware, hard
> installs, pages of XML configs, etc. I am against all that stuff.
>
>
> Amen. So at the very least we'll need a clean web app with a simple
> wizard/form to configure the scheduler, account types, auth mechanism,
> default data protocol and file system, admin accounts, and some performance
> preferences. What else have i missed?
>
>
> As for names, NERSC Web Toolkit has no great value attached to it. We
> discussed with LLNL and others that it could be the Nice and Easy Web
> Toolkit, or whatever else. Again if the name can be changed to make the API
> non-territorial in order to increase adoption I am all for it. I have no
> idea what it means but I think Globus was a good choice. Big. Round.
> Inclusive.
>
>
> The auth, data, jobs, and metadata services seem to be a good starting
>> place. We might also want some information services such as system
>> discovery and monitoring. Given that this is meant to drive web apps…and
>> hopefully future ones, perhaps supporting event and pub/sub services would
>> also be helpful. Lastly, is the api in charge of monitoring itself or are
>> we assuming that's a production detail the centers would implement
>> themselves? One of the things we've done with AGAVE is provide both real
>> time and historical uptime reports for our users. This service is deployed
>> outside the api, and lets us know the ongoing stability of our services and
>> the systems and services we depend on. We find that it also helps build
>> trust with our users. I'm not sure that this service is really in the scope
>> of the API, but it's one of those things that, until we had it, we always
>> missed, but never knew it. What are other people's thoughts on this?
>>
>>
> I'd go for monitoring as a core topic. Job status already is (GET on
> jobid) as are queue monitoring (GET on system). Monitoring is something my
> group does a lot of (app perf, FS perf, power/env monitoring, etc.) so I
> know that scope creep is very possible here. What about these topics?
>
>
> I suppose this depends on what the purpose of the api is. If it's a target
> for developers to build apps that end users to access the resources and
> conduct science from the web, then providing sysadmin tools might be
> overkill. I'm not sure our next chemistry gateway would see much value from
> having access to power consumption stats on each job. It would make for
> some great visualizations, though. The question seems to be who is our
> target audience.
>
>
> system monitoring:  uptime , core count, #people logged in, date of
> deployment, pub/sub on outages to steer workflow automation (back off when
> outage is announced)
>
>
> This all seems like good info. Do we envision supporting single or
> multi-tenancy? If running on the HPC system, do we foresee API sessions
> being tied to system sessions?
>
>
> self monitoring : introspection on the number of sessions recent API
> activity, could be admin only.
>
>
>
> FS monitoring : df for the web
>
> data transfer monitoring : maybe this is GO's territory not ours?
>
>
> There are some significant auth challenges to doing this in the general
> case and if we want to ship something as a deployed solution, we need to
> support sftp, ftp, fops, gridftp, and irods out of the box so users can
> access data how they see fit.
>
>
> ...one more topic...
>
> At the risk of bloating the set of tasks ahead I am leaning towards the
> notion that task queues may also be a core concept. That gets us into the
> wild and woolly space of workflows, but relying on the HPC batch queue
> system to delineate a set of steps to be done is failing IMO, at our site
> at least. They don't scale and their latency is too high. There are big
> wins for providing assistance to science teams who have 10^6 "things" they
> need to do "M at a time" and currently have no great solutions except
> writing their own control loops. So while i see a pressing need there, I am
> not 100% that NEWT/AGAVE etc. is the right place for it.
>
>
>
> Were you thinking of pulling from an existing workflow project to
> implement this? Do you have a preference? AGAVE is all queue driven on the
> backend, so if you wanted to reuse the preprocessing mechanism from our IO
> and Data services that chain together a series of data transforms to
> process data as it passes in and out of our Data Store, I'm happy to done
> the code. While I don't think it's exactly what you're describing, it gets
> us part of the way there and it's built off the Quartz framework, so the
> technology is well established. It might be a starting point. Another
> option might be looking at Airavata, Apache ODE, Spring, jBPM, etc given
> they already have well-defined mechanisms for doing this.
>
> What kind of interoperability are we targeting with these services? Are we
> striving for standard near-compliance (given that full probably isn't
> reasonable), or usability?
>
> Let's keep the conversation going. I am available to chat pretty much
> anytime at 510-486-4748 if you have ideas or what I said here is unclear.
>
> Cheers,
>
> David
>
> https://foundation.iplantcollaborative.org/monitor/history
>>
>>  --
>> Rion
>> @deardooley
>>
>> On Oct 2, 2012, at 1:24 PM, David Skinner <deskinner@lbl.gov> wrote:
>>
>>
>>
>> On Tue, Oct 2, 2012 at 11:11 AM, Annette Greiner <amgreiner@lbl.gov>wrote:
>>
>>> Hi folks,
>>> To frame the discussion for the October 11 conference call, I've started
>>> thinking about how to go about putting together a first draft of a standard
>>> API. It seems to me that it would be logical to simply blend the two APIs
>>> we currently have, NEWT and the iPlant API (Agave). There's a lot they have
>>> in common, though of course they have different terms for things. I would
>>> suggest we choose our terms based on three principles:
>>> coherence: terms in the API should have grammatical commonality with
>>> other terms of similar function in the API
>>> clarity: terms should be unambiguous
>>> memorability: terms should be easy to associate mentally with their
>>> meaning in the API
>>> cross-center generalizability: terms should make sense in the context of
>>> any HPC center
>>>
>>>
>> Good points. One step toward the last one is to make a fake HPC center
>> stubbed out in the software itself. This serves two purposes. 1) you get to
>> try the software or develop on your laptop without touching the guts of
>> your HPC center. 2) It provides a common meeting ground for all of us as a
>> plain vanilla idealization of an HPC center. To be a little more specific I
>> am suggesting that auth, data, and job functions should have stub
>> implementations that operate locally and while ineffectual they should be
>> processed in a way that mimics a real HPC center.
>>
>> auth: just use an install-time configured password with a test user
>> data: just move local files on disk
>> jobs: just run the command (fork/exec).
>> KVP store: use a couch or mongo local instance.
>>
>> Once we have that stub implementation down and packaged people can
>> download and try the API without herculean efforts.
>>
>> We'll also need to discuss the scope of the standard API. How much should
>>> it cover? Clearly, centers should be free to do their own implementations;
>>> we are just defining a set of REST calls that can be re-used across
>>> implementations. But what functions should be left out of the standard? I'm
>>> thinking here of functions that are not specific to HPC. One example is the
>>> iPlant PostIt, which generates disposable URLs. I think that's a great
>>> service to offer people, but I would suggest we leave it out of a standard
>>> for HPC, since it isn't a function that arises from the HPC context. The
>>> iPlant Apps and Profile features strike me similarly. NEWT has a liststore
>>> feature that could also be seen as a non-HPC aspect of that API.
>>>
>>>
>> The guiding model for NEWT thus far has been to stick to the core things
>> you see in HPC center documentation. How do I log in, how do I move files,
>> how do I run things. We don't need to be rigid about that but having a
>> guiding principle with a decent level of simplicity seems prudent.
>>
>> We've also advocated an exception mechanism whereby you can step outside
>> the API and do whatever you like. That provides some demarcation as to
>> where the API stops and where custom machinery begins.
>>
>> -David
>>
>> What do other people think? How should we define what is in/out of the
>>> spec?
>>> -Annette
>>> --
>>> Annette Greiner
>>> Outreach, Software, and Programming Group
>>> NERSC, LBNL
>>> amgreiner@lbl.gov
>>> 510-495-2935
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>

Received on Wednesday, 3 October 2012 20:35:13 UTC