Why Functional Tests don’t belong in a Build Environment

The previous part discussed why a unit test for a class should be written by the developer of that class, and why a functional test should be created by an independent tester. This posting argues that functional tests should not be part of the build process of the product, but instead should be developed and executed separately. For this, I give guidelines for setting up an independent validation system.

Unit tests are written by the developer simultaneously with the production code. If the API or the functionality of a class in the production code changes, then the corresponding unit test class has to be adapted accordingly, otherwise the build executing the unit test will fail. This is no principal problem, since both should be changed by the same developer simultaneously and should be committed only after they are consistent.

How is the situation with functional tests? A functional test with its test data is coupled to the business requirement and the part of the product fulfilling this business requirement. So if the product changes, then the functional tests must change as well.

If the functional tests run in the same build as the product, every change to the product will cause the build process to fail, unless the developer and tester change their code simultaneously. This would require a strong synchronization between developer and tester, which is usually not feasible.

The result is that functional tests almost always fail if they are integrated into the build environment. This calls at least for different builds for the product (code plus unit tests) and the functional tests. But I will go further: I recommend to not run the functional tests in the build environment at all! Because the build environment not only does not facilitate anything, but on the contrary adds a lot of extra complexity to the test environment.

The following diagram compares the two approaches of running functional tests, i.e. in a build environment and in a separate test environment against an independent validation system (or staging system).

System setup for unit tests within a build system and functional tests against a validation system. On the left side: Unit tests within a build system are triggered by the build system (Jenkins/Hudson, Luntbuild, Ant, Maven etc.), and executed locally. The application runs on the build server ideally setup by the build system and unit test framework (JUnit, TestNG) before the execution of the unit tests. On the right side: Functional Tests are executed by test drivers for functional tests on dedicated test clients against a validation system, which is set up and configured like the live system.

How to run GUI-/Browser-based Tests

If you run functional tests in the build environment, then they must be executable in a batch-run, driven by the build framework. Specifically, on the build system, the build framework, e.g. Jenkins (formerly known as Hudson), Luntbuild etc., starts a build script (Maven, Ant), which starts a test framework (JUnit, TestNG), which executes the test case. Of course, this is technically possible. There are functional test tools and build environments, which have such capabilities, but they bring in extra complexity and entail extra development and significant configuration effort (often in the range of 1-2 weeks). But the restrictions are high.

It is very hard to develop GUI-related or browser-based functional tests in an environment without visible screen-output. How do you debug test failures, if they occur only within the build environment? How do you create keyboard and mouse events (i.e. entering text in an applet within a browser)?

If the functional tests are executed on dedicated test clients (often the tester’s PC), then any test and scripting software available for the desktop OS of the tester’s PC can be used. What’s more, the test execution can be easily observed (GUI-Tests) and debugged by the tester, so the test development has less obstacles in its path.

Merits of an Independent Validation System

Another important point is the test of the configuration of the system. If the system under test runs within the build framework, then usually the configuration of the system deviates strongly from the configuration of the live system. An independent validation system can be set up and configured much closer to the live system including hardware, installed software and components and configuration of all parts. Thus, a functional test against the validation system checks a large part of the system configuration as well and finds system specific errors and failures in the deployment of the system.

A test on the validation system almost satisfies the well-known software development paradigm: The work is only done, if the requirement is fulfilled on the live system. Manual reproduction and repetition of failed tests is also easier, if the test is executed from the same client against the same system under test (validation system) with the same product version, the same configuration and the same test data.

If the functionality of the product is accessed by an API instead of a GUI and if the product or parts of the product are already very stable, then it can be useful to put all stable functional tests as regression tests into the build environment.

Stable Builds, or when to Deploy on the Validation System

If a lot of effort is invested in builds with high quality standards, like high coverage of unit tests or code reviews before a commit, then the builds are usually stable enough to be deployed on the validation system as a basis for the execution of functional tests. If however the builds are not very stable, then there should be an additional pre-validation system, which is used to stabilize a build. As soon as a build is deployed on the pre-validation or validation system, functional tests are executed against it.

In order to adapt existing functional tests and to prepare new ones, the testers must be aware of all functional changes and additions that are to be expected in the new build. Depending on the team size, this information exchange can be done in small teams in ad-hoc meetings with the developers or — in larger teams — with the support of detailed feature lists maintained in task management tools like JIRA.

For all failing functional tests, the requirements must be checked (often in collaboration with the developer), to find out whether the test or the product is working incorrectly. After the errors in the functional tests are fixed, the functional tests are repeated. The remaining failing tests should then all be caused by flaws of the product.

A build is considered stable if and only if the functional tests checking the main paths of the business functionality have passed, so that the execution of further tests is feasible and makes sense. In projects with longer development cycles (several months), stable builds should be achieved every one or two weeks and towards the end in even shorter intervals. In projects with short development cycles (several weeks), stable builds should be achieved every one or two days. 10 to 15 stable builds should be achieved per development cycle. Each stable build shows, that the development and test group is in sync and the product is stable. A close collaboration between the functional testers and the developers is essential in such a setup.

Summarizing the argumentation, I emphasize the following points:

  • Functional tests are executed on dedicated test clients.
  • Functional tests are executed against a dedicated validation system.
  • Only stable builds are deployed on the validation system.
  • Stable builds should be achieved regularly and in adequate frequency throughout the development cycle.

In a later blog, I will discuss test coverage, when to write and rely on unit tests and when on functional tests and how to derive test coverage. A rough guide for electing the right tools will be given as well.

Series Navigation<< Developers, Don’t Write Functional Tests!