Saturday 10 November 2007

Test Driven Refactoring

As you probably know, the third part of the TDD development cycle is refactoring. After the Green phase, your code works, but it works for the particular set of values that you used in your tests, and it is ugly. The first step is to remove the hardcoded variables so that your code works in a more general situation. This is not actually refactoring, since you change the behavior of the system. This is the second part that everybody likes: moving the functionality around, creating lots of small classes (ravioli code), and injecting dependencies wherever you can. Lots of fun.

The question is, should I write tests for this, or shouldn't I? The tests would mean other 3-step cycles within the first one, other refactoring steps, perhaps becoming a fractal-like structure. What's worse, tests would mean a brittle design that would be hard to change later. No tests would mean problems documenting the class behavior. Recently I chose the latter path, but got lost in the refactoring process, forgetting what I meant in the long run. In addition, being a relative novice and not having a pair (for programming), I'd like to have some guide. The refactoring process remains not very test driven, rather test-constrained, since I don't add any new tests in the process of refactoring. So, refactoring is something driven by my own ideas of a nice design.

Enter Test Driven Refactoring.

The initial idea was to remove the duplicated parts of my tests. Here's an example. Suppose your customer wants the following:
If a user is logged in, she sees the "Edit my profile" and "Logout" buttons.
Next, we ask what is "logged in", and we come with another requirement:
If a user is registered, she can log in entering her username and password at the login screen.
Naturally, we ask what is "registered", and here's another requirement:
After a user has entered her username and password at the registration screen, she can log in using the same username and password.

Let's write the first user story:
1. If a user has entered "test" and "password" into the username and password boxes at the registration screen, then the same words at the login screen, she sees the "Edit my profile" button.
2. If a user has entered "test" and "password" into the username and password boxes at the registration screen, then the same words at the login screen, she sees the "Logout" button.
Each test would probably involve a complicated setup just to check a simple boolean value in the end. Later, we'll probably add some other tests involving various pieces of functionality available to the logged in members. On the other hand, there might be other ways to become logged in. For each, we have to write still another two tests verifying that the two buttons are visible, and perhaps other functionality is available.

We quickly notice that our tests multiply exponentially, much quicker than the features are being added. In addition, we notice that the setup portion of our tests is too big, which is a well known code smell. So, how do we deal with it using the Test Driven Refactoring approach?

Suppose we are past the Green phase, so all the required functionality is there. The Refactoring phase is now divided into three steps. The first is to get rid of the hardcoded constants, "test" and "password". Now we can use any username and password, not just these. The second part is the most interesting one. We notice that all we need for the buttons to be visible or not is a boolean value. Now we see what "Excessive setup" means: instead of using a boolean value for the input, we have provided much more information that will be simply thrown away.

So, we decide that there should be two components A and B, and the A's responsibility is to provide a boolean value of some form to B, whereas the B's responsibility is to show or hide the buttons. The boolean value can take any form: an interface with a single bool property, a variable, whatever. The point is, we can test A and B separately: first we verify that A provides the correct value, and we verify that B uses it correctly.

Recalling the first user requirement, we notice that the functionality of B (and the corresponding test) bears some resemblance. Could we infer the boolean variable from the very beginning? Probably yes, but there are several disadvantages. First, we wouldn't write the initial test, so we couldn't be sure that A and B are integrated correctly. Second, it would be upfront design, which is a sin. And last, sometimes user requirements are much more vague, and it's hard to map them to classes and structures. Recently I wrote a simple search engine, and one of the requirements was that the content should be indexed for faster searches. This requirement would be hard to implement straightforwardly, and the typical TDD process with some obvious refactoring lead me to a wrong direction. After correctly applying TDR, I came up with an intermediate structure (the one that A passes to B) that was identical to the traditional representation of indexed content in a database.

After the TDR part, you are free to return to the "chaotic" or "free-flow" refactoring to achieve even better design. But this part should be one without writing new tests.

Why is it better than the free-flow approach? It turns out that the TDR process gives us a set of components and functionalities that are very well correlated to the client requirements. Thus, there are good chances that the tests won't be broken if you decide to improve your design. These tests are not quite unit tests, since you probably want to split the components into smaller objects. But these tests reflect the client's idea about the functionality, not your idea about the good design.

No comments: