Gmane
From: Rene Rivera <grafik.list <at> redshift-software.com>
Subject: [testing] Possible changes to testing..
Newsgroups: gmane.comp.lib.boost.testing, gmane.comp.lib.boost.devel
Date: 2005-03-09 02:16:53 GMT (4 years, 16 weeks, 5 days, 6 hours and 32 minutes ago)
All,

I've been reading all the testing related post and not having time to 
respond to them. So I decided to talk about what might be done to 
improve things in one big posting.

For along time now one of my objectives has been to use BuildBot
(http://buildbot.sf.net/) to improve the management of running the
regression tests. For those who don't feel like reading about that
software here's a quick summary:

It is a client-server based configuration where the test server controls
the test clients telling them what to do for test. In this arrangement
the client do not directly control what they are testing, nor how. All
the clients provide is an execution environment for the testing. The
server is configured to control the various clients as to what they
test, including how to get the source, and when they test. The testing
is change driven instead of time driven as we currently have. This means
that each time a change occurs on the source (CVS in our case) the
server decides to tell the clients to run tests. Because of this direct
communication to the clients the server is able to give direct feedback
as to what the testing is doing. This includes a live display of test
progress, down to a dynamic view of the test log.

Some of the issues and how this step might help. And other changes I've 
thought about might help..

*Reproducibility*

BuildBot controls what the checkouts are and what triggered the build. 
You can see this by selecting one of the yellow "Build #" links on the 
BuildBot display. If there's ever a question about what code produced an 
error one can get the specific version of the tree to attempt 
reproduction. The history of what tests have run is kept. Which means 
that we can finally answer the dreaded "When did that start breaking?" 
question.

*Scalability*

One big change I'd like to see is the breakup of running tests to make 
it possible for testers to run tests on subsets of libraries, or even 
individual libraries. For example there might be some testers which have 
a special interest in a particular library, Boost.Python comes to mind. 
It would be ideal to make it possible for them to only run those tests, 
and to go through the extra steps of doing the Python setup. Also for 
some popular platforms, it becomes possible to get much faster response 
rates from testing if, for example, we partition the library testing 
space throughout those platforms and they would test the libraries in 
parallel.

For this to happen some significant organizational changes need to 
happen to the tests. As it currently stands such a division is not 
possible because of the way tests are defined and organized. We have a 
single test point, status/Jamfile, which a) points to the rest of the 
test points, libs/*/test/Jamfile, and b) defines it's own tests. The 
problem is that there is a conflict, for some libraries, between the 
tests defined in status/Jamfile, and the tests defined in 
libs/*/test/Jamfile. For example Boost.Config has a reduced set of test 
in status/Jamfile. This situation is likely a reaction to reduce test 
times to something manageable. And I understand that library authors 
would need to have a place to run their own set of comprehensive tests.

My proposal is to create a set of canonical tests that comprise the 
regression testing suite independent of the library author's tests. This 
set of tests would be structured so that it's possible to run each 
libraries tests independent of others. It would be adding this type of 
structure:

boost-root/tests/≤library>/Jamfile
boost-root/tests/≤library>/<sub-library>/Jamfile
boost-root/tests/≤library>/<some-other-grouping>/Jamfile

So for example a tester could say she only wants to test python/*, or 
numeric/ublas/*, etc.

*Fragility*

Simply put because BuildBot exposes the complete testing procedure it 
becomes much easier to see where problems occur in the testing. Also, 
because there is a central server managing the procedures it's more 
likely that problems can be solved at that one location instead of 
opaquely trying to fix the clients. With the, hopefully, additional 
distributed testers it becomes harder for testing to break down completely.

Another aspect we need to address is fragility of the server. Ideally we 
would have multiple BuildBot servers to add redundancy. To make such a 
multi-server, multi-client setup resource efficient we would need to 
manage the distribution of testing between them.

*Resources*

The only predictable way to address the resource usage, is to distribute 
the testing so we can create more capacity. Breaking up the tests is the 
only way I can see to move there. It was already suggested that only 
slicing the testing to single toolsets would help. But that doesn't 
really address the problems. For example it would still be prohibitive 
for me to run tests on my G3 Mac for CW-8.3 because it's a slow machine 
and it would take days to run just one cycle of tests making the results 
useless. But it would be possible for me to run a minimal set of tests, 
for example Boost.Config and other basic tests.

Restructuring also brings up the possibility of moderating the set of 
tests that make it into the regression suite. Right now we are in a 
position of asking library authors to define what the tests for the 
library. We can't moderate what gets into testing, or what needs go out. 
We need some procedure for reviewing tests, and some form of approval to 
get tests into the regression system. I'm not saying that authors would 
not be able to do additional testing, just that we need to define what 
the standard set of tests are so that we can concentrate our testing 
efforts. It would still be possible to set up additional resources to 
run "experimental" tests.

*Response*

The gain from segmentation and distribution of testing is hopefully 
obvious ;-) But another advantage of using BuildBot is that we are not 
tied to waiting for the XSLT processing to see results. Sure the results 
are not going to be as incredibly well organized as the Meta-Comm 
results but they are immediately available. So if there is a significant 
delay in the processing, because of load or breakage, we can still 
continue working.

*Releases*

Managing the testing for a release was brought up a many times. And it's 
clear that requiring testers to do manual changes is just not working. 
For BuildBot having the control of what is tested on the server means 
that at any point one, or some small number, person(s) can make the 
switch to have testing resources devoted to release testing. One 
possibility that the finner grain testing allows for is to limit the 
testing for a release to the required set of toolsets and platforms.

OK, that's enough rambling... I know I haven't mentioned many other 
items raised, particularly about bjam and Boost.Build. I'll leave that 
for another post. I just need to get back to setting up the BuildBot 
server now ;-)

-- 
-- Grafik - Don't Assume Anything
-- Redshift Software, Inc. - http://redshift-software.com
-- rrivera/acm.org - grafik/redshift-software.com - 102708583/icq