There are a few misconceptions regarding Selenium / WebDriver that should be clarified from the start. When I first began to use Selenium (pre 1.0) the mechanism for automation relied on the technique of Proxy Injection. While this worked well for applications which rendered largely static content, the wide-spread use of AJAX made this technique very unreliable from an automation point-of-view. Timing was difficult to manage and required client-side javascript to notify automation of the application's ready state. In addition, it was impossible to account for browser events that were not represented in the DOM (such as javascript alerts).
Enter WebDriver 2. Instead of using Proxy Injection, WebDriver automates actions using a browser's native events. This allows for more reliable timing mechanisms as well as the possibility for catching native browser events. Overall, this is great leap forward for automation.
Beware Selenium-IDE
There is just one other caveat with using Selenium from my perspective which should be avoided at all costs - the Selenium IDE. While useful for small, relatively static sites, Selenese tests result in massive duplication, brittle abstractions, and tests that do not encourage reuse. Capture and record tools such as the Selenium IDE are seductively simple to use, but extremely difficult to maintain. Changes to the application require changes to tests themselves. This quickly becomes unmanageable due to the violation of the DRY principle. As such, tools such as Selenium IDE can be classified as semi-automatic as they require a lot of manual intervention when the application changes. Ultimately, semi-automated testing methodologies are doomed to fail simply because the underlying system they depend upon to function ultimately changes. Any system which does not account for application evolution can only capture the requirements at a given point in time.
WebDriver is an API
By favoring programmatic automation, you can leverage general programming principles to create a framework which accounts for the natural evolution of the AUT.
Which brings us to the crux of the matter -WebDriver is an API, not the solution. What does this mean? WebDriver is the means by which automation is achieved, but by itself it does not necessarily give structure to the solution of automating applications in general. The relationships between application components are loosely defined and has no inherent structure in WebDriver. In some cases, the pre-package abstractions are insufficient to reflect the complex relationship between application components.
This is the role of the framework; to support automation using reusable abstractions against an evolving application.
Automation as a goal does not happen in a vacuum. Applications must be constructed to support automation. As such, coordination with application architects is crucial to support any effort. The framework must be flexible enough to support clearly defined tests whose implementations may change as the application evolves, but whose intent remains the same. To this end, I will describe some of the design decisions I made when creating my testing framework.
Web Application Automation Concepts
The Reusable Element
Most applications reuse GUI controls; this must be reflected any testing framework. While WebDriver supports finding generic WebElements and manipulating them via additional actions, this requires test code which rely on these methods to be aware of implementation details. In addition, this introduces duplication as soon as you have more than one of a single type of element. By creating an Element abstraction, you can define similar application components by their locators as well as the types of operations they can support for automation. For example, a Text Box is different from a Dropdown. One cannot select items from a Text Box but both may allow for text input. Common behaviors need to be defined in a single representation, while type specific behaviors need to be differentiated.
Having a reusable Element abstraction allows you to do nifty things like automatic validation of a control based on its type. This is particularly useful for smoke tests. As well, changes to an Element's behavior can easily be propagated throughout the entire testing framework if it is expressed canonically in a single representation. Elements also allow for functional composition. By taking two for more fundamental Elements, you can compose testable aggregate Elements with increasingly sophisticated behavior which still behave as a single functional unit.
You can also localize procedural abstractions such as when an Element is resolved. Ideally, you want to resolve any given just before it is used. This minimizes DOM inconsistencies which arise in applications which re-render output based on post-backs.
XPath is Regex for the DOM
Location Strategies are determined by application structure. Ideally, every element has a consistent element ID that is unique to the page where it is located and the same between application invocations. A lot of applications however, do not meet this criteria. Although there are a number of different location strategies supported by WebDriver, the most powerful by far is using XPath.
Tools such as XPather for Firefox allow you to select elements via XPath, but unfortunately use only positional element expressions (such as table[1]/tr[3]/td[2]). Not only are these expressions difficult to read, but they are heavily reliant on the ordering of the DOM. This makes them brittle.
What is need is a way to specify DOM path expressions which are rooted in the application's vernacular and disambiguate Elements effectively. By leveraging the expressiveness of XPath, you have the ability to specify Elements relative to other Elements. This is useful when labels are distinct from components they decorate. In addition, the annotation method of supplying locators for Elements in WebDriver precludes the ability to use templates to find Elements of a give type which vary only by their identifying characteristic. XPath allows the use of simple String templates. This allows for parameterized Element locators.
The expressiveness of XPath comes at a price. Generally, using XPath for element location is slower than other methods. In addition, not all browsers support XPath natively (Internet Explorer for example). That being said, XPath provides a strategy for taking an existing application and making it amenable to automation. As the application evolves to support unique persistent IDs, these changes can be made globally at the Element abstraction.
For an excellent tutorial on XPath, use the documentation at ZVON.org
The Reusable Page
Essentially a Page is a container for Elements which are manipulated via automation. Ideally, Elements associated with a Page should be lazy-initialized on use. In addition, a Page serves as a navigational component. To test something, you need to know where it is and how to get to it. Relationships between pages which define navigational structure depend on the type of application you are trying to automate.
The Hierarchical Application
In a Hierarchical Application, each page is located only once in the navigational structure. This type of structure is amenable to programmatic page traversal. In addition, if the application constructs page hierarchies programmatically (as they should), this information can be extracted from the application and the Page relationships can be created via code generation. The Generation Gap pattern is particularly useful in this regard. C#'s Partial Classes in addition to the ability to nest Classes makes it well suited to solve this problem.
The Process Oriented Application
By far the more difficult to automate, the Process Oriented Application has no clear notion of location; Pages relationships are defined in the context of a given process. A wizard-based application is the stereotypical Process Oriented Application. This type of application is not well suited to programmatic page traversal simply due the fact that Pages may have circular dependencies. In this case, it may be difficult to automate the creation of Page relationship.
The Case Against WebDriver.PageFactory
PageFactory relies on defaults for the Element lookup strategy that may not be appropriate for the AUT. The use of @FindBy annotation also makes it difficult to create dynamic Element lookups which are parameterized. The modification of annotations requires the use of reflection which is both cumbersome and expensive. In addition, it is questionable whether caching WebElements via @CacheLookup is useful given the possibility of StaleElementExceptions.
Instead of the PageFactory, Pages should express their dependencies explicitly in their constructors and hold a lazily-initialized dictionary of Elements with keys based on the language present in the application. If used in combination with the Element abstraction described previously, Element initialization is delegated back to the Element when accessed, not to Page. The Page's element dictionary provides a mechanism for finding Elements; nothing more. Pages constructed in this manner can be invoked directly or through the use of a dependency injector.
Putting it All Together
The Case For the Use of Fluent Interfaces
From a programming perspective, It is useful to think of the automation framework as serving different clients. There will be programmers responsible for wiring up the framework to the application as opposed to those responsible for wiring up tests to the framework.
These are two different tasks whose difficulty can be mitigated through the use of Fluent Interfaces. Page and Element definitions clearly express their requirements. Page navigation and Element access read more descriptively instead of a series of programmatic operations on application component primitives.
Degrees of Freedom
In the words of Einstein:
"Everything should be made as simple as possible, but not simpler."
All of these abstractions are not designed to introduce unnecessary complexity, but to manage the inherent complexity of automating an application. Application testing must be able to respond to various degrees of freedom which have the ability to destabilize test outcomes. The ultimate goal is reproducibility of the test's intent in the face of change. The following are the different changes which a framework must be resilient to.
The Application Changes : An Element is added/modified/removed
To add an element for automation simple requires associating it to a given Page. When an element is modified (such as when it is superseded with a new control with more advanced functionality), it need only be changed in a single location. Changes cascade throughout the entire framework with little work.
The Application Changes : A Page is added/modified/removed
If you programmatically determine page relationships, then simply running the code-generation component will create a stub for the new page or remove associated references to a deleted page. Most applications however undergo evolution more often; pages are modified. Elements are added/substracted from pages; this should happen independent of Element evolution. Pages should simply bind Elements that are part of their scope of responsibility.
The Automation Framework Changes
While the solution presented here hinges on the use of WebDriver, there is a case to be made for for framework isolation. All software evolves, and WebDriver is no exception. An automation framework built on WebDriver should also isolate changes to WebDriver itself. Leaking implementation details into tests by directly referencing WebDriver primitives results in fragility when the WebDriver API changes. Ideally, Page or Element bindings should not be directly impacted by changes to the framework itself.
The concepts of interface inheritance and implementation delegation to wrap primitive framework calls works well to isolate the automation framework from WebDriver changes. In essence, the Element object behaves much like a WebDriver WebElement without exposing any internal implementation details. This allows extension of the original WebDriver API with custom helper methods/interfaces.
The Target Web Browsers Change
There's a good chance that at some point, you will have to test you application against different browsers. To prepare for this eventuality, tests should be created in a web browser agnostic fashion. No test should depend on a specific browser; all automation operations should be done through the RemoteWebDriver/WebElement. By doing so, not only will you be able to run your tests against other browsers, but you will also be able to accommodate future browser updates as support for them is added to WebDriver.
What's Next?
Despite having the ability to automate testing, it is infeasible to test everything. Not all tests are equal. The most valuable tests reflect actual application usage. This is the role of specification testing. In the next article, I'll talk about how to use JBehave to fill this role.