Re: Repository layout from Peter Linss on 2012-06-04 (public-test-infra@w3.org from April to June 2012)

From: Peter Linss <peter.linss@hp.com>
Date: Mon, 4 Jun 2012 10:41:43 -0700
To: Robin Berjon <robin@berjon.com>
Cc: James Graham <jgraham@opera.com>, "<public-test-infra@w3.org>" <public-test-infra@w3.org>
Message-Id: <DDF1546A-2392-4F88-BE37-3750A3E9B5B8@hp.com>
On Jun 4, 2012, at 9:08 AM, Robin Berjon wrote:

> On Jun 1, 2012, at 15:49 , Linss, Peter wrote:
>> On Jun 1, 2012, at 5:13 AM, Robin Berjon wrote:
>>> I know, but the problem is that since people are treating the file system layout as a meaningful-enough classification for tests, they're not bothering to update the manifest and resubmitting it. Ideally the metadata would be intrinsic to the test (externalised, authoritative metadata is almost always a bad idea) and the framework would automatically know of updates. Otherwise things are bound to get out of synch.
>> 
>> Well, that's the way the system was designed. If people don't update the inputs to the system it will always have bad data. 
> 
> Well obviously. But as we scale these tools we have to take the social component into account. If we notice that people aren't updating the useful metadata, then we ought to make sure that the tools compensate for that — it's a lot easier to patch tools than humans (in my experience at least ;).
> 
>>> No, they're definitely not, because most test suites do not require a build step.
>> 
>> Right, but eventually they should be using the build system to produce the manifests at least. For small suites that don't change often, manifests could be produced by hand, but that gets unmaintainable fast.
> 
> Sure, but if all the build step does is read the test files in order to produce the manifest, why not have the tool that reads the manifest read the test files directly, assuming a common metadata format (the foundations for which we have) and common rules for finding tests in a directory tree? I know Shepherd handles this — I'm mostly wondering about orthogonalising all of that and limiting the amount of integration between tools and various moving parts that we need to all be happy.

That should be doable. As it is now Shepherd doesn't read the test files directly, it uses the same python library that the build system uses. Making a direct import should be fairly simple, but it does introduce a dependency from the framework to the build system. Although since Shepherd already has that dependency if the server hosting the framework also hosts a Shepherd instance, then all the requirements will already be there. 

> 
>>> I'm aware of Shepherd, but I've only seen it in use for CSS. It's hard to know if it is well suited for other suites or not with just that view. With that in mind it would be a good idea to integrate it on w3c-test.org to see if it applies well.
>> 
>> That's the plan. It was designed to be generic and configurable. It's still going through enough development that the DB schema is still getting adjusted, so while we're using it for production use, I'm not sure it's stable enough to be setup on w3c-test.org as I don't have enough access to that box to keep it in sync with development. We should talk about that more in a few weeks though…
> 
> That's part of what worries me. Maintaining a single DB schema for all the varied needs that groups seem to have may require it to shoulder more weight than would be easy to work with. We can make this easier to handle using Doctrine migrations (or some similar solution, maybe Symfony Components have something for this nowadays) but it still requires a lot of integration. I'm wondering if a more "unixish" approach with small tools tied together in ad hoc manners would not be simpler for everyone involved (and allow us all to make progress without waiting on one another's updates).

With Shepherd I don't think the DB schema issue is one of different groups' needs, it's more that the system is still getting features (even just those for the CSS WG) and in the past few weeks I've found myself adding fields to support them. For now, I've been getting by with simply tweaking the DB on csswg.org as I update the code. But I've already been thinking about ways of dealing with this moving forward. I'd rather not introduce yet more tools that these systems are dependent on if it can be avoided simply. Pretty much all the changes I've been making lately can be done through straightforward SQL queries.

> 
>> Another point. Both Shepherd and the framework were designed to have installs mapped one-to-one with test repositories (especially Shepherd, the frame work is more flexible but there's still the test name uniqueness issue).
> 
> Not to come across as a dangerously radical anarchist, but could we perhaps identify tests with URIs? Hell, URLs even!

We can't simply refer to each test by the URL that it's located at, because they won't always be hosted in the same place.

We could theoretically give each test a unique ID independent of its filename and location, and yes those could be URLs. It'd have to be bound to the test's metadata and guaranteed to never change. Let me give that some thought as to how it would play with the existing tools.

> 
>> I don't think every group should be dumping all their test suites into a single install. It won't scale to cover the entire W3C with a single instance. They deployment plan needs to be mapped out better before it gets out of hand. I consider the current install of the framework on w3c-test to be more experimental/evaluation than final.
> 
> Ok, that's a very useful data point because I think it is at odds with the expectations in developing the w3c-test setup (I'm not blaming you in any way, just pointing out the lack of alignment here :). If Shepherd isn't going to scale to the usage we'd intended, we definitely need a rethink.

Just to be clear, it's not that the tools won't scale to handle large amounts of tests, I was concerned more about scaling the test repositories. 

The current design of Shepherd is designed to manage a single repository, and that repository can contain any number of test suites. Within a repository there's an expectation that test names will be unique. So you need a separate instance for each repository (multiple instance can live on the same server easily). 

The framework isn't (currently) bound to a single repository since all it's input is done via manifests. Though a single instance of the framework does expect all test names to be unique (since tests can exist in multiple suites). 

I know that CSS, SVG and HTML each have their own repositories for test suite development, I'm not sure what all the other groups are doing, i.e. do they each have their own repository or are some sharing? 

Managing test name uniqueness within a single repository is relatively simple, managing that across multiple repositories will be impossible. 

I'm just pointing out that there's a problem coming here if all the groups keep dumping all their suites into the same instance of the framework. Sooner or later there's going to be a name collision (if it hans't happened already). The two ways to solve that are either have multiple instances of the framework, mapping each instance to a test repository, or we need to augment the framework code to deal with it (which is probably the way to go, I'm thinking something like teaching it about mapping tests to repositories).

I suppose I could also add the notion of multiple repositories to Shepherd if we really want a single instance to manage multiple test repositories. I just haven't thought about that yet. 

The bottom line of what I'm pointing out here is that I don't think this aspect has been formally planned out yet (at least not the I'm aware of). i.e., how many test repositories are there, where are they, and how do they map to tools. I think we need to discuss this and have a common plan (and we should probably start another thread for that).

Peter
Received on Monday, 4 June 2012 17:42:08 UTC