Flickr.AD.HOCKNEY

Recent Photos

photo herbinius ostentatius googly-eyed media diet - zzzzzz stacks totem plane II totem plane I wing zero room with a view cathedral grove II long beach I sky rock at long beach long beach twilight gull life preserver island tree lime light I white flower I paper flower I chartreuse flower I

DEL.ICIO.US.WORD.SALAD

Tag Roll

 

12
Aug
2011

The Big Picture : Part 4 - The Automation Environment

Continuous Deployment

Human error is fundamental to the endeavor of creating software. Our ultimate aim to not to completely remove error (an impossible task), but to be aware when errors are introduced. This awareness is what makes for better software, not simply automation itself. There are a large classification of behaviors where the computer does not know or care which behavior is correct. Being able to use relative judgement is still a characteristically human trait.

Automation is the 80/20 solution (give or take a few percent). It should cover a majority of the test cases, but depending on the complexity of the application, automation does not generally encompass all possible tests. What it does is remove the tedium from tests which are amenable to automation. This allows manual tests (which should still be done) to focus on the really hard 20 percent where no amount of automation (barring artificial intelligence) is sufficient to determine the correctness of behavior.

There are few key components to creating the automation environment - Continuous Integration, Virtualization, and Scripting. Each is discussed in detail.

Jenkins/Hudson

Continuous Integration is the cornerstone of any automation effort. If you have not integrated CI into your development practices, stop and invest the time to do so. Without CI, you do not have a means of coordinating the efforts of the entire organization to improve the quality of the produced software.

The original focus of CI was on developers. By generalizing this principle to the entire application stack, we move towards Continuous Deployment. Not only are applications tested (via unit / integration tests for code level validation) when built, but the built artifacts themselves are deployed without user intervention to run complete end-to-end tests. This type of testing ensures that the software will work when fully integrated and deployed, not simply tested in isolation.

Automated acceptance tests should be run when either a new version of the application is available or when tests against the application have changed. Test execution is coordinated through a dependent project which is aware of either application or test project changes. I strongly suggest to separate out executing end-to-end automation from the actual acceptance test framework. Should either change, it has minimal impact on the other.

The role of Jenkins in the Automation Environment is the that of the Coordinator. Built artifacts for both the application and test clients are moved to a Staging Area in the environment. The Staging Area contains assets for the client machines (which include the target web browsers), the application servers, and the database. In addition, it can contain the scripts which coordinate actions between the various nodes of the automation environment.

Caveat

Jenkins currently does not allow projects to reference other workspaces. In some cases (such as with the application artifacts), the copy operation may be coupled with the application project. Depending on the size of your published artifacts, this may add considerable time to the build process. Also non-build failures, such as the inability to push artifacts into the test environment result in build warnings.

VMWare

There are generally three classes of machine in a 3-tier application environment:

  • Clients (consisting of different target OS/Browser combinations
  • Application Servers (consisting of the supported OS/Application environments
  • Database Servers (consisting of supported back-ends)

VMWare (as well as other virtualization platforms) enables having the entire application environment in a self-contained network. End-to-end testing can then use this environment to test against different combinations of the application stack. Tests which succeed in all combinations validate consistent behavior across the supported architectures.

The elegance of this solution is that it can easily scaled by simply adding more VMs. Hardware is typically the constraining factor in virtualization solutions which are highly dependent on memory and I/O. Modern multi-core processors are more than sufficient from the point of view of a client VM. CPU affinity and proper resource allocation can ensure that VMs are sufficiently provisioned for their role. There are few principles to follow when designing your virtual infrastructure:

  1. Ideally, you have enough physical RAM to hold all VMs in memory without swapping.
  2. Drives which have fast IOPs reduce disk contention (important for Windows VMs which like to swap)
  3. CPU allocation should not exceed actual number of Cores/Threads

In addition, VMWare supports virtual snapshots.Snapshots allow for VMs to be restored consistently to a known state. This feature removes the need to physically start machines and wait for boot up. VMs are started and ready to run immediately. In addition, tests which make non-reversible changes to the database layer can have their changes undone by simply restoring the VM to its original snapshot. This ensures that tests can run against a known baseline environment. Variation between test runs is minimized, if not entirely eliminated.

Operations against VMs (such as starting and restoring snapshots) is supported by VMWare through the use of the vmrun utility. In addition, vmrun also allows commands to be executed within the guest VMs from the virtualization host. Scripted test execution from and external trigger is possible when combined with shell/Ant scripts on guest VMs. If these scripts are available on a common Staging Area as suggested in the previous section, all dependent guest VMs can use the same scripts between test executions.

In addition, vmrun can be used against snapshots. If testing against the same OS with slight variations, you can save space by taking snapshots of the core image with the variations applied. For example if you are testing Windows clients with various Service Packs, you can take a snapshot against each Service Pack state you wish to test against. When invoking a test, all you need to do is specify the appropriate snapshot.

Supplementary Services

Since the virtual environment is a full functional network, there are a few services which make managing VMs easier.

DNS/DHCP

Name resolution and network configuration can be unified using DHCP with dynamic DNS updates. This allows all machine IPs/names to be centralized to the dhcpd.conf file (assuming you are using a Unix/Linux host). The Workstation/Fusion/Server versions of VMWare natively support DNS/DHCP through their built-in virtual network interfaces for the NAT or Host-only networking configuration. In ESXi, this ability is not present as there are are no virtual network interfaces; a dedicated host for managing this service is necessary.

It is strongly recommended that you avoid using static IPs for hosts as well as providing static host mapping files. This solution quickly becomes difficult to manage as you add more machines to the system as changes to the network have to be propagated to all hosts participating in the same network.

NTP

VMs are sensitive to time differences, specially if they attempt to synchronize their clocks against the host's internal clock. To avoid time drift between VMs, it is suggested that you use NTP to synchronize all VM clocks. By removing this source for variation, you can ensure that time-sensitive operations, such as establishing secure connections, happens reliably within the virtual environment. Time drift may affect secure connections as most security mechanisms protect against replay attacks by having a sufficiently small connection windows.

HTTPS

There may be times when you are required to simulate connections over HTTPS. To ensure that invalid certificate messages do not interfere with client tests, create a self-signed certificate for the application server. Import this self-signed certificate as a root certificate authority on the client VMs. Connections should be established without security messages.

Ant

Scripts are the glue that binds the Coordinator (Jenkins in this example) to the VMs that participate in test execution. Automation scripts should be treated as code; include them with your version control system. In this example we use Ant, but certain tasks may be better served with DevOps tools such as Chef / Puppet.

Ant should be structured with a number of custom targets corresponding to the various phases of test execution. To kick off test execution, the Jenkins run automation project is started which invokes an ant script at the root of the Staging Area on the VM host responsible for managing the participating guest VMs. The phases are as follows:

Pre-automation

  1. Set the test build to use for the clients.
  2. Set the app build to use for applications.
  3. Execute application config pre-conditions such as setting configuration information.
  4. Start the VMs in the order of their dependencies; usually the database first, followed by the application server, and lastly the client VMs being used. Ant allows for the use of the elements and . Generally, database and application starts occur sequentially, but client VMs can be started in parallel.

Test Execution

  1. Execute tests on client VMs.
  2. Collect test artifacts and results and publish them to a known location.

Post-automation

  1. Suspend all VMs in parallel.
  2. Revert the VMs to their base snapshots. This step ensures that the VMs are ready for the next test execution run

There were a number of Ant/Ant-Contrib tasks that I used to

  • propertyregex - Performs regular expression operations on an input string, and sets the results to a property.
  • timestampselector - The TimestampSelector task takes either a nested element, or a path reference, and sets either a named property, or a path instance to absolute pathnames of the files with either the N latest or earliest modification dates
  • sshexec - Runs a command on a remote machine running SSH daemon.

Miscellanea

As I suggested previously, the use of single Staging Area simplifies Continuous Deployment. When testing against different supported application servers, it allows a single copy to be used for configuration purposes. Each application server variant configures itself against the same build. Keeping a recent history of published artifacts in the Staging Area also allows for test run comparisons. By resetting a symbolic link to select the current build to use, you have the ability to troubleshoot between versions of the application/test builds.

TODO : Grid and Automation in the Cloud

To fully leverage the Grid, tests need the ability to run in parallel. There are a number of strategies that can be employed to achieve this, but I'll defer to a future article on Grid usage. There are a few experiments that I'd like to try first before committing to any design. In addition, the use of Grid should be consistent whether you are running it from your own environment (as above) or in the Cloud (via SauceLabs).

11
Aug
2011

The Big Picture : Part 3 - Organization is Key

Modularity

While tests should be comprehensive, only the test automation usually is required to run the complete suite of tests; test and application developers generally work on a subset of the entire framework. This is more important when test runs become time consuming. Immediate feedback encourages the use of tests; delays in receiving test results interrupt workflow and make it less likely that will used concurrently with development. Ideally, testing can be initiated from within the IDE (Eclipse in this case) or the command line (which ever is preferable to the developer). Being able to support both modes of execution fulfills the different requirements of developers versus automation.

The design of the test execution encourages the use of java Properties; these can have local overrides which are used for development, but can be overridden when executed in the context of automation. This article focuses on scriptable execution via Ant; other build tools such as Maven can also be used.

Test configuration parameters that should be made into Properties include:

  • application URL
  • username & password
  • browser type

Reference Datasources

In addition to having modular tests, a reference database is essential. A reference database should provide only enough functionality to verify application behavior. This allows separating out configuration dependent tests from core functionality. By being able to differentiate between core and configuration, you have better separation of concerns. Failures in actual application behavior can be distinguished for misconfiguration errors. In addition, for test behavior to be replicable, the same test pre-conditions should be present for every test execution run. Using VMWare snapshot functionality or Oracle's FlashBack capability allows you to ensure that your database is in a consistent known state when tests are run.

Test Organization

The organizational mechanism employed depends on the tools you use for test execution. Below, I discuss the use of JUnit and JBehave mechanisms for organizing tests into workable modules. In addition to using the tool's organizational support, it is also helpful to use package namespaces to modularize the physical test files.

JUnit

JUnit has the @RunWith(Suite.class) and @Suite.SuiteClasses class annotations which allow you to create a hierarchy of suites with one master suite (used by automation) to include other functionally distinct suites. Within these functional suites, I suggest separating out Reversible and Non-Reversible tests. Reversible tests do not make permanent database changes, while NonReversible tests do. Application developers can use Reversible suite for quick smoke tests. Ideally, you will the restore the reference database to its initial state after the execution of NonReversible tests. This ensures consistency between test runs. This is even more important if you are testing outputs based on fixed/known inputs and want to compare it with an expected output.

JBehave

JBehave's organization is based on the concept of a Story. A Story is a functionally cohesive set of scenarios. Scenarios are the actual behaviors which describe the story's functionality. In addition to using Stories, it is useful to also modularize the Stories themselves along functional application lines. Step classes and story files are required to run JBehave tests. Story files are simple text specifications which include everything necessary to describe the expected behavior given a set of pre-conditions, operational steps, and expected post-conditions. Step classes are responsible for mapping the scenario language to actual test automation commands. Both Steps and Stories can be reused. I suggest keeping them in separate directories. JBehave uses reflection to determine which stores to run via the storyPaths() method on JUnitStories. This uses an Ant-like path expression to find textual stories which JBehave inspects to invoke test methods. Using more specific path expressions in the various functional specific stories limits test execution. Conversely, a single JUnitStories class with a sufficiently general path expression should be able to find all stories in your project.

Versioning

Tests should be considered programmatic artifacts; like application code, they should be managed via a source code repository. Treating tests as code allows us to do all the normal things version control allows. Test file histories document the changes required as the application evolves. In addition, it is useful to be able to version test collections in relation to the application releases/branches. This allows for test development to proceed unhindered on development branches while supporting writing new test cases against the stable release (with the understanding that tests applicable to the development branch will be ported).

Test Context

Inevitably, you will have to reconstruct context to understand the reasons behind why a given test was written a certain way. Files that you used to troubleshoot the problem domain must be found again and inspected to fully understand the implications of a change. When this reconstruction happens is indeterminate; it might be tomorrow or a year from now. Unfortunately, your ability to reconstruct this context diminishes over time. In addition, there is a (high) probability that the person who may need to make sense of this context may be someone other than yourself.

What is need is a mechanism to associated this context with the test. If you are using Eclipse, Mylyn's saved contexts in conjunction with the appropriate adapter for your defect system (such as Redmine/Trac/Bugzilla) solves this problem. By having the ability to associate a context with a defect (via a zipped file which contains your project layout), you remove the limitation of an individual's memory and instead leverage the automation system's abilities.

What's Next?

Now that I've established some basic principles for creating a test automation framework, I'll describe how to put it all together to support continuous deployment using Jenkins/Ant/VMWare. The goal is to provide a feedback system for application development. The sooner developers are aware of potential regressions caused by application changes, the quicker they can respond to fixing it.

10
Aug
2011

The Big Picture : Part 2 - Testing for the Common Man

What Are We Testing?

To be clear, the focus of this article is to discuss tests which involve various client web browsers which connect to the application. There are other kinds of tests that we will not cover here - unit tests and integration tests (at the service level). There are other tools which are suitable for those tests and they fall under the responsibility of the application developers themselves. More importantly, these types of tests are non-client facing.

What is important however, is that tests cover the behavior of different versions and kinds of web browsers. While WebDriver supports HTMLUnit via the WebDriver interface, we focus primarily on browsers supported by the RemoteWebDriver interface. This includes the following browsers:

  • Firefox
  • Chrome
  • Internet Explorer
  • Opera

The only unsupported major browser is Safari. Writing tests against the RemoteWebDriver allows us to keep them browser agnostic. In addition, Grid 2 currently supports only RemoteWebDriver clients.

Everyone seems to have a different definition of for particular kinds of tests. For the purpose of this discussion, I'd like to offer my own definitions; others however, will likely disagree with these categorizations.

Acceptance Tests

All tests which ensure that the application behaves as expected can be considered Acceptance Tests. Within this categorization are other sub-categories.

Smoke Tests

Smoke Tests ensure that the application is technically functional. Generally, these type of tests ensure navigational and Element behaviors are consistent through the application. Validation at the Element level could be done programmatically as part of these tests. Smoke Tests are capable of verifying technical requirements, but generally at the expense of providing user context to the operations. These types of tests are useful for programmers, but not so much for end users.

Specification Tests

Specification Tests use the language of the user domain to express application behavior. The value of using specifications is that these tests create a consistent vocabulary which is used to communicate end-user expectations to programmers. Essentially, these specifications form the foundation of acceptance criteria which is codified in a readable manner. There are many different specification frameworks, but I choose JBehave for the simple fact that specifications are written in plain text. This makes it easy for non-technical endusers to contribute to the creation of specifications. Features as well as bugs can be described via Stories and Scenarios.

Regression Tests

Regression Tests typically express defects within the application to which fixes have been applied. The difficulty with this type of test is usually a question about with reproducibility. By using the same language used for Specification Tests, preconditions, steps to simulate the behavior, and expected outcomes are clearly expressed.

Performance Tests

Performance Tests are generally designed to exert load on the application back-end. There are tools such as JMeter which apply the record-playback principle for generating load, but tools such as this have limitations. If your application uses dynamically generated assets, these types of tests are difficult to use. In addition, it is questionable whether tests based on JMeter measure actual application behavior against a client or the ability of the application server to respond to concurrent requests.

Synthetic browser tools remove a critical part of the equation - the browser itself. While HTMLUnit can be used in this scenario as well, it is not a browser that end users are likely to use. Different browser introduce variability (such as different javascript engines) that must be accounted for. This is more important for applications that are AJAX aware.

One and the Same

Code reuse is a fundamental principle in managing complexity in large software projects. As test coverage expands to cover more aspects of the AUT, leveraging this principle becomes crucial to manage the tests themselves.

Ideally, Specification and Regression Tests can be re-used for measuring performance. This allows for load testing using a single framework. This can be accomplished through the use of a concurrency demultiplexer that can take single Specification or Regression Tests and dynamically create multiple concurrent instances (determined at runtime) of the test to run. JUnit natively supports concurrency through the RunnerScheduler. You can effectively simulate group tests using this method.

This solution however, requires a number of machines to run the tests. Virtualization can be used to provision client browser machines, but has it's own costs. One must balance the costs of machine resources with human resources.

Domain-specific Languages (DSL) To The Rescue

Behavior Driven Development engages end-users in defining application behavior. Instead of simple technical requirements driving application development, the perspective of the enduser is given a central role. Specifications are written using the language of their domain; this provides the necessary context for considering the success or failure of a behavior. This is the role of the Domain-specific Language; JBehave is particularly useful in this regard.

JBehave allows the following:

  1. Specifications can be written in plain text.
  2. Specifications use a structured syntax that clearly expresses pre-conditions, steps, and expected outcomes.
  3. Specification encourage stories to be written in the point-of-view of a User of an application. This allows allows for specifications to be defined for different perspectives given a User's role within an application.
  4. Instead of tests expressing operations at the component (Element) level, instead they describe functionality in the context of individual user operations (Scenarios) grouped by functionality (Stories).
  5. Stories are reusable a pre-conditions of other stories.
  6. Parameterized steps allow for table-driven input. This effectively removes the need for tools like Fitnesse. Both input and output can be specified with the test.

Caveat Emptor

While JBehave is generally useful for BDD, it is not without its faults. My initial attempt to use the GuiceAnnotatedEmbedderRunner was a complete failure. The documentation surrounding its use and the associated documentation where insufficient to diagnose my issues. Barring a source-code evaluation of JBehave's implementation of this feature, I would recommend avoiding it.

In addition, I would suggest avoiding the JBehave-Selenium module. There are design flaws in its implementation which tightly couple test code to WebDriver. Pages created in the manner suggested by JBehave-Selenium inherit directly from WebDriver thus breaking encapsulation. Changes in WebDriver APIs have a direct impact on the Page implementation. Using Dependency injection via an injector framework, or manually via constructor injection is the only requirement for the Page. As per the Gang of Four, favor composition over inheritance.

In addition, having a typed WebDriverProvider exposes the browser implementation details into the test framework; this should be safely abstracted away into a Configuration object.

The Grid

Grid allows tests to distributed across machines for parallel execution. This is particularly useful for load testing as well as providing concurrent testing against various browser types and versions. When combined with different operating systems, this method provides an effective means of testing an application across a number of different platforms simultaneously. A single virtual machine can have multiple grid nodes which can consist of different browser types. The number of nodes per machine only limited by memory. By having a central repository of available browser resources, scheduling concurrent test execution becomes manageable.

The structure of the automation environment plays a direct part in how Grid is used. I'll cover this in more detail in Part 4.

What's Next?

The number of tests against a mature application only increase over time. This requires a method for organizing the tests themselves. The next article deals with issues of test organization and how to introduce modularity and versioning to your tests.