Sunday, July 18, 2010

Subversion Branches & You

The topic of managing a subversion repository comes up a lot around the office. The development team where I work is still getting used to the idea of branching, and with that come the ideas of merging, rebasing, and reintegrating. I figured I'd go over some of these concepts for anyone else who may be searching.

The typical layout of a subversion repository is to have three directories (sometimes called resources) under the root: trunk, branches, and tags. These directories serve to hold the mainline development, experimental or incomplete development, and release code, respectively. While most everybody who has used subversion is familiar with trunk, some are not clear on the purpose of the other two.

To fully explain branches and tags, I'd like to start first with trunk. When it comes to source control, I operate under a simple philosophy: Trunk is always stable.

This means that under source control, the trunk resource should always be in a working state. A user or developer should be able to check out the contents of trunk and work with it right away. However, stable in this context does not mean that APIs or functionality can't change from revision to revision; it simply means that the code base should work without breaking. This is an important distinction.

Of course, leaving trunk in this state while working on large change sets and experimental updates conflicts with another established philosophy: Commit early, commit often. So, where do these changes go? Branches, of course!

Branches are basically temporary copies of trunk where a developer can work and commit and break as they like. Once the changes to a branch are stable, the branch can be reintegrated to trunk, and the branch deleted. This keeps trunk clean and stable, while allowing the developer to retain the benefits of revision control. A project can have as many unique branches as the developer or team likes. Branch names don't really have any rules, but most teams have some sort of preferred convention.

When a project is ready for a release, a special type of branch is usually created, called a tag. A tag can be a straight copy of trunk, serving as a marker in the project's revision timeline, or it can be a modified version of the code base derived from a build system. Tags are usually named for the release version they represent. Since tags represent a fixed release, they are not supposed to be modified once they are created. While subversion itself has no concept of tags (it treats them like any other branch), the established convention is that they are permanent and users do not expect them to change. Most third-party repository management tools will warn you if you are about to modify a tag.

A typical active subversion repository may look like this:
/
branches/
rchouinards_branch/
feature_195/
foo/
tags/
1.0/
1.1/
1.2/
trunk/

Let's say a developer - we'll call him Bob - is working on the project shown above. Bob is assigned Bug #209, and after reviewing it decides that the changes required to fix the bug warrant creating a branch. Bob creates his new branch from trunk, naming it bug_209, and switching his working copy to it. As Bob is working on his ticket, other developers are busy committing to their branches, and some are even committing to trunk. After a while, Bob decides he needs to check his code against the current trunk. To do so, Bob would merge the changes from trunk to his working copy, and if everything looks good, commit those changes back in to his branch. Bob has rebased his branch from trunk.

A little while later, Bob decides that his work is done, and the bug is fixed as confirmed by his regression tests. Bob switches his working copy to trunk, and merges the changes from his branch to his working copy. One last successful run of the test suite later, Bob commits his working copy into trunk, deletes his branch, and marks Bug #209 as ready to deploy. Bob has just reintegrated his branch into trunk.

Bob also happens to be the release master for his team, which makes him responsible for creating tags. Since Bug #209 was prioritized critical, he needs to push the fix into production as soon as possible. Bob uses the team's build system to make sure trunk is stable and ready to deploy, and then creates a release tag from the build output, which he names 1.2.1. Bob then uses the team's deployment tools to verify the tag and push the code out into production. Hooray!

In Bob's case, the fact that his team uses branches and tags efficiently allowed him to easily deploy the application into a production environment. Hopefully, you have a little better understanding as to what branches are, and how a good branching strategy comes into play during the testing and deployment phase of development.

I'm still working on a post that will describe a bit more of the magic that happens when Bob runs his build and deployment tools. :-)

Saturday, July 10, 2010

PHP Application Lifecycle: Unit Testing

The concept of unit testing is nothing new, but unfortunately it seems to still be rare among PHP developers. I believe it's not because developers don't think testing is a good idea, but instead that they think testing is hard, or makes development times longer. I actually used to be one of those developers.

I think that most developers would agree that testing is a good thing, and we should all be doing it. Some developers like to test their work simply by calling it in another bit of code (or reloading the page in browser) and observing the results. The problem with this approach is that it is inflexible. While it may work fine on smaller bits of code, larger classes and objects that interact with other objects may not be as easy to test. Most of the time, these quick one-off tests assume perfect conditions, which isn't always the case.

By using a testing framework, a developer can quickly build test cases for a bit of code, and run those tests as they continue to make changes to the code in order to make sure nothing gets broken. In fact, this type of testing is called regression testing, and is only one type of test a developer can create. The most common types of tests are:
Smoke Test
The first, simple test against a new bit of code. These are used to check the code for expected behavior with valid input.
Regression Test
A set of tests written to verify and fix specific bugs or usage scenarios. For example, if a method expects a string, and causes a bug if given an integer, a regression test should be written - which fails - to verify the presence of the problem. The code should then be fixed to make the test pass. Regression tests are then used to make sure a bug is not re-introduced in future code revisions.
Integration Test
More advanced testing which checks the interaction between two or more portions of code. An integration test might be written to make sure a library is properly writing data to the database.
Behavior Test
Another more advanced testing methodology in which the test isn't concerned so much with the result, but how the code works internally. If a bit of code is expected to log data to a file, a behavior test will call that bit of code, and watch for the proper call to the log method.

PHP has two main unit testing tools: SimpleTest by Marcus Baker, and PHPUnit by Sebastian Bergmann. SimpleTest's Website hasn't been updated in a while, and I'm not sure of the state of the tool. PHPUnit is the most widely accepted, and is compatible with the xUnit family of testing tools. I use and will focus on PHPUnit for this discussion. PHPUnit supports all the test types outlined above, but for brevity I'm only going to review a simple smoke test.

Let's assume a simple class which provides a few math-based methods. It may look like this:
<?php
class Calculator
{

public function add($first, $second)
{
return (int) $first + (int) $second;
}

public function subtract($first, $second)
{
return (int) $first - (int) $second;
}

public function multiply($first, $second)
{
return (int) $first * (int) $second;
}

public function divide($first, $second)
{
return (int) $first / (int) $second;
}

}

A quick one-off test for this may look like this:
<?php
$calc = new Calculator;

echo "add(): ";
// Should output "4"
echo $calc->add(2, 2);
echo PHP_EOL;

echo "subtract(): ";
// Should output "2"
echo $calc->subtract(4, 2);
echo PHP_EOL;

echo "multiply(): ";
// Should output "10"
echo $calc->multiply(5, 2);
echo PHP_EOL;

echo "divide(): ";
// Should output "5"
echo $calc->divide(10, 2);
echo PHP_EOL;

Output would look like this:
[rchouinard@beta ~]$ php testCalc.php
add(): 4
subtract(): 2
multiply(): 10
divide(): 5

This approach seems simple, but some problems become apparent as development on the Calculator class continues. For starters, the test script doesn't really indicate what the test is checking for. The person invoking the script must know what output is expected in order to tell if the test passed or failed. This test script can be rewritten as a PHPUnit test case very easily:
<?php
require_once 'Calculator.php';
require_once 'PHPUnit\Framework\TestCase.php';

class CalculatorTest extends PHPUnit_Framework_TestCase
{

private $calc;

protected function setUp ()
{
parent::setUp();
$this->calc = new Calculator;
}

protected function tearDown ()
{
$this->calc = null;
parent::tearDown();
}

public function testAdd ()
{
$this->assertEquals(4, $this->calc->add(2, 2));
}

public function testSubtract ()
{
$this->assertEquals(2, $this->calc->subtract(4, 2));
}

public function testMultiply ()
{
$this->assertEquals(10, $this->calc->multiply(5, 2));
}

public function testDivide ()
{
$this->assertEquals(5, $this->calc->divide(10, 2));
}
}

Running PHPUnit against this file gives us easy to read and understand output:
[rchouinard@beta ~]$ phpunit CalculatorTest.php
PHPUnit 3.5.0beta1 by Sebastian Bergmann.

....

Time: 0 second, Memory: 1.00Mb

OK (4 tests, 4 assertions)

If one of the assertions had failed, we would get output like this:
[rchouinard@beta ~]$ phpunit CalculatorTest.php
PHPUnit 3.5.0beta1 by Sebastian Bergmann.

.F..

Time: 0 seconds, Memory: 1.00Mb

There was 1 failure:

1) Calculator::testSubtract
Failed asserting that matches expected .

/home/rchouinard/working/CalculatorTest.php:29

FAILURES!
Tests: 4, Assertions: 4, Failures: 1.

Hopefully some of the benefits of a testing framework are apparent now. Our test code doesn't have to deal with output, and we have immediate pass/fail feedback without having to know what values the test is expecting. PHPUnit even tells us exactly what went wrong and caused the test to fail.

This has been a very simple intro to PHPUnit, and doesn't even begin to scratch the surface of what PHPUnit is capable of. I would encourage you to take a look at the PHPUnit documentation to learn more. For a working example of a PHPUnit setup, take a look at my PHP component library.

In coming posts, I'll discuss integrating PHPUnit into other tools for some truly powerful code analysis.

Wednesday, July 7, 2010

PHP Application Lifecycle: Build vs. Deploy

Over the past few days, I've been playing with Phing, a build tool for PHP, similar to Ant for Java. I initially intended to use Phing to deploy a Web application I'm developing, but I've come to realize its power as a build tool while working on a set of PHP libraries as well. Through my time with the tool, I've realized the important distinction between a build tool, such as Phing, and a deployment tool.

Many articles and tutorials around the Internet tend to focus on using Phing for three things:
  • Kicking off automated tests with PHPUnit
  • Building API documentation with PHPDocumentor
  • Deploying code to a Web server
While Phing is definitely suited for these tasks, and indeed comes with many built-in tasks for these exact purposes, it can be used — and is intended to be used — to do so much more.

In the PHP world, where code is not compiled, the primary purpose of a build tool is to prepare a code base for distribution. This usually means changing configuration files, cleaning out development-only artifacts, and packaging the code base in a neat little tarball, PEAR package, or other archive. While Phing and other build tools often have built-in support for simple deployments via rsync, scp, and other file transport mechanisms, they typically don't support truly robust deployment features.

In many environments, especially those with only a single Web server, the simple mechanisms provided by the build tools may be just fine. However, in more complex environments, these methods quickly break down, and deployment should be handled by a dedicated utility. That's not to say that Phing doesn't belong in these environments; to the contrary, the real power of Phing can shine through the most here. By taking the code base from its development environment, transforming it, testing it, and packing it, a build tool can pass off the prepared code to the deployment system with a higher level of code confidence than before.

Some time in the future, I'll discuss a little about how I'm managing my code with Phing, my deployment solutions, and introduce my other new friend, the Hudson continuous integration server.