Re: stitching together APIs

On Tue, Oct 2, 2012 at 1:13 PM, Rion Dooley <deardooley@gmail.com> wrote:

> I agree with David's general point, though I'm wary of issues creeping up
> from different OS impacting the evaluation of the API itself. Perhaps using
> something like CloudFoundry or even a VM with a pretty vanilla stack would
> serve the same purpose and level the playing field some. Also, are we
> assuming the API is running on the HPC system or as a hosted service?
>
>
Sure. There is something I like about the simplicity of the tarball,
configure, make, run all being local but also see your points about client
support. I think it's safe to say that the target platform is ultimately a
Linux node sitting close the batch and filesystems of a big Linux machine.
That's something you can reliable mimic on say an apple laptop or a server
node. Am I missing other platforms that we'd want to support easy test
drives on?

As for hosting, so far the assumption has mostly been that it runs at the
center. Some places like LLNL would have big issues with hosting-out access
to the big iron. I'd let them speak to that but that's my guess. BTW,
GlobusOnline has a hosted approach for data movement, an associated REST
API, etc. My feeling is that for execution and job control the HPC system
based approach has some heavy upsides. What do others think?

Alternatively we (NERSC, TACC, etc.) could be the cloud test/dev space for
the API. Sufficiently stubbed out it would be hard for people to make
trouble with and it would be a zero-step install. Think like those CMS demo
pages that let you login to test drive. Anything that makes it easier for
people to get a taste of HPC on the web I am all for considering. I suggest
that if we go with Cloudfoundry, NERSC/TACC, or whatever that we step
through what we're asking newcomers to do in order to try it out. This will
appeal both the application/user folks who are interested in HPC web
interface options as well as facilties/center people who are going to
evaluate whether they can live with it.

One of the icebergs that sunk the grid ship was difficultly in getting
software and services up and running. Highly layered middleware, hard
installs, pages of XML configs, etc. I am against all that stuff.

As for names, NERSC Web Toolkit has no great value attached to it. We
discussed with LLNL and others that it could be the Nice and Easy Web
Toolkit, or whatever else. Again if the name can be changed to make the API
non-territorial in order to increase adoption I am all for it. I have no
idea what it means but I think Globus was a good choice. Big. Round.
Inclusive.

The auth, data, jobs, and metadata services seem to be a good starting
> place. We might also want some information services such as system
> discovery and monitoring. Given that this is meant to drive web apps…and
> hopefully future ones, perhaps supporting event and pub/sub services would
> also be helpful. Lastly, is the api in charge of monitoring itself or are
> we assuming that's a production detail the centers would implement
> themselves? One of the things we've done with AGAVE is provide both real
> time and historical uptime reports for our users. This service is deployed
> outside the api, and lets us know the ongoing stability of our services and
> the systems and services we depend on. We find that it also helps build
> trust with our users. I'm not sure that this service is really in the scope
> of the API, but it's one of those things that, until we had it, we always
> missed, but never knew it. What are other people's thoughts on this?
>
>
I'd go for monitoring as a core topic. Job status already is (GET on jobid)
as are queue monitoring (GET on system). Monitoring is something my group
does a lot of (app perf, FS perf, power/env monitoring, etc.) so I know
that scope creep is very possible here. What about these topics?

system monitoring:  uptime , core count, #people logged in, date of
deployment, pub/sub on outages to steer workflow automation (back off when
outage is announced)

self monitoring : introspection on the number of sessions recent API
activity, could be admin only.

FS monitoring : df for the web

data transfer monitoring : maybe this is GO's territory not ours?

...one more topic...

At the risk of bloating the set of tasks ahead I am leaning towards the
notion that task queues may also be a core concept. That gets us into the
wild and woolly space of workflows, but relying on the HPC batch queue
system to delineate a set of steps to be done is failing IMO, at our site
at least. They don't scale and their latency is too high. There are big
wins for providing assistance to science teams who have 10^6 "things" they
need to do "M at a time" and currently have no great solutions except
writing their own control loops. So while i see a pressing need there, I am
not 100% that NEWT/AGAVE etc. is the right place for it.

Let's keep the conversation going. I am available to chat pretty much
anytime at 510-486-4748 if you have ideas or what I said here is unclear.

Cheers,

David

https://foundation.iplantcollaborative.org/monitor/history
>
>  --
> Rion
> @deardooley
>
> On Oct 2, 2012, at 1:24 PM, David Skinner <deskinner@lbl.gov> wrote:
>
>
>
> On Tue, Oct 2, 2012 at 11:11 AM, Annette Greiner <amgreiner@lbl.gov>wrote:
>
>> Hi folks,
>> To frame the discussion for the October 11 conference call, I've started
>> thinking about how to go about putting together a first draft of a standard
>> API. It seems to me that it would be logical to simply blend the two APIs
>> we currently have, NEWT and the iPlant API (Agave). There's a lot they have
>> in common, though of course they have different terms for things. I would
>> suggest we choose our terms based on three principles:
>> coherence: terms in the API should have grammatical commonality with
>> other terms of similar function in the API
>> clarity: terms should be unambiguous
>> memorability: terms should be easy to associate mentally with their
>> meaning in the API
>> cross-center generalizability: terms should make sense in the context of
>> any HPC center
>>
>>
> Good points. One step toward the last one is to make a fake HPC center
> stubbed out in the software itself. This serves two purposes. 1) you get to
> try the software or develop on your laptop without touching the guts of
> your HPC center. 2) It provides a common meeting ground for all of us as a
> plain vanilla idealization of an HPC center. To be a little more specific I
> am suggesting that auth, data, and job functions should have stub
> implementations that operate locally and while ineffectual they should be
> processed in a way that mimics a real HPC center.
>
> auth: just use an install-time configured password with a test user
> data: just move local files on disk
> jobs: just run the command (fork/exec).
> KVP store: use a couch or mongo local instance.
>
> Once we have that stub implementation down and packaged people can
> download and try the API without herculean efforts.
>
> We'll also need to discuss the scope of the standard API. How much should
>> it cover? Clearly, centers should be free to do their own implementations;
>> we are just defining a set of REST calls that can be re-used across
>> implementations. But what functions should be left out of the standard? I'm
>> thinking here of functions that are not specific to HPC. One example is the
>> iPlant PostIt, which generates disposable URLs. I think that's a great
>> service to offer people, but I would suggest we leave it out of a standard
>> for HPC, since it isn't a function that arises from the HPC context. The
>> iPlant Apps and Profile features strike me similarly. NEWT has a liststore
>> feature that could also be seen as a non-HPC aspect of that API.
>>
>>
> The guiding model for NEWT thus far has been to stick to the core things
> you see in HPC center documentation. How do I log in, how do I move files,
> how do I run things. We don't need to be rigid about that but having a
> guiding principle with a decent level of simplicity seems prudent.
>
> We've also advocated an exception mechanism whereby you can step outside
> the API and do whatever you like. That provides some demarcation as to
> where the API stops and where custom machinery begins.
>
> -David
>
> What do other people think? How should we define what is in/out of the
>> spec?
>> -Annette
>> --
>> Annette Greiner
>> Outreach, Software, and Programming Group
>> NERSC, LBNL
>> amgreiner@lbl.gov
>> 510-495-2935
>>
>>
>>
>>
>>
>>
>
>

Received on Wednesday, 3 October 2012 02:07:03 UTC