Monday, February 17, 2014

PHP Code Analysis with NetBeans 7.4

NetBeans has become my IDE of choice for PHP in the last year or so. Version 7.4 shipped with support for code analysis tools with PHP, namely PHPMD and PHP_CodeSniffer. Unfortunately, usage of these tools within the IDE is not well documented, so I wanted to share how they work.

Configuration

Installing the tools is up to you; NetBeans just uses the scripts you point to. I prefer using composer's global option, but if you prefer PEAR or some other installation method, that's fine too.

Once installed, navigate to "Tools > Options > PHP > Code Analysis" inside of NetBeans and configure the paths to the scripts and the default rules you want to use.

Running

Now here's the part nobody really tells you: how to actually use these tools. From inside of NetBeans, navigate to "Source > Inspect..."

That's pretty much it. Select the scope and and configuration to use and click "Inspect." If you configured everything correctly, the scripts will run and after a few seconds (or minutes, depending on project size) you'll be presented with a nice results view.

Ignore Folders

You may want to instruct the IDE to ignore certain folders when running code analysis. For example, I use Composer to manage dependencies, and I don't want to see results for everything under my vendor directory. To ignore vendors, right-click on the project in the project pane and navigate to "Properties > Ignored Folders" and add the vendors folder under "Extra Folders for Code Analysis."

Tuesday, April 17, 2012

PHP Password Library

When I wrote my first Web application that required a user to log in, I stored the user's password in my database using a simple MD5 hash. It worked well, and I wasn't storing my passwords in plain text. Eventually I learned about rainbow tables, and suddenly using MD5 or SHA1 didn't seem like such a good idea. Even salted hashes using these functions are no match for tools like hashcat on modern hardware.

I've used a number of password hashing techniques over the years, and I even used Openwall's portable hashing library for a little while. While the library is certainly easy to work with, I didn't like that it was a monolithic class written for PHP 4 and offered very little control over the process. At this point, I was used to working with the modular components of Zend Framework. So, I did what most any other developer with some extra cycles to burn would do: I wrote my own password hashing library for PHP 5.

The library is modular and extensible. The current version is capable of creating password hashes using several widely recognized and recommended methods including bcrypt and PBKDF2. It also includes adapters for calculating password strength based on popular algorithms such as the one recommended by NIST.

So check out my PHP Password Library, and feel free to report any issues or feedback on the GitHub project page.

Wednesday, October 27, 2010

Change of Scenery

A few weeks ago, I turned in my notice at BizJournals and informed my manager that I would be accepting a position at another company. It was a difficult decision to make, made even more difficult by the timing of project due dates and the need for resources to be on-hand. In the end, though, it was an opportunity I simply couldn't pass on. I met a lot of great people at BizJournals, and I hope to maintain many of the connections made there. It was a good place to work, and the development staff there was top-notch.

That said, I've moved on to working at Oracle Corporation with the MySQL.com Web team. I'm now working from my home office, so the lack of direct human contact through the day will take some getting used to. I'm sure I'll be able to pull a blog post or two out of that...

Tuesday, August 17, 2010

Battle of the APIs: REST vs. RPC

It seems that every time I talk to another developer about building an API, there tends to be some confusion on terminology and implementation details. While many will say that they are implementing a RESTful Web service, when they start detailing the implementation it becomes clear they really mean RPC. I’m not sure where the confusion stems from, as both standards are pretty well documented. Still, I’m amazed at how few REST implementations get it right.

Both REST and RPC have their place, and may even coexist within a single project. They simply provide different ways of accessing things, and each has its own strengths and weaknesses which make it more or less suitable for a given purpose.

What is RPC?

RPC means Remote Procedure Call, and is used to call a function or method exposed on a remote server. When discussing RPC, most developers tend to be referring to either XML-RPC or SOAP. Both are simply protocols, and can be used as completely valid RPC implementations. I’ve also seen many developers happily role their own implementations as well.

Most RPC implementations use XML or JSON to pass messages or payloads between the client and the server. The request message would contain the name of the remote method to call, along with the method arguments. The server would then respond with a message containing the return value of method and any other messages.

If an API uses SOAP, XML-RPC, or uses URLs like http://example.com/api/method?arg1=foo, it is (most likely) an RPC interface. There are many ways to implement an RPC API, but in the last case of a home-grown API, requests might look like those below.

To request a list of user accounts in XML format from an RPC API, the request URI might be similar to this:

http://example.com/api/getUserList?format=xml

The API should send back an XML payload of user accounts, or it may respond with some sort of error message instead. In either case, a 200 OK status code will probably be returned.

The same API may allow the creation of a user account, either using GET or POST (broken apart for readability):

http://example.com/api/createUser \
    ?format=xml \
    &firstName=Example \
    &lastName=User \
    &displayName=Example+User \
    &passwordHash=5f4dcc3b5aa765d61d8327deb882cf99h238

In this case, the API would probably respond in much the same way, with an XML payload containing either a success flag or an error string.

What is REST?

The term REST means Representational State Transfer, and is a pretty popular buzzword on the Internet. A RESTful API presents application objects as Web resources, and uses standard HTTP methods and headers to pass information back and forth.

Communication with a REST endpoint is mostly done using the GET, POST, PUT, and DELETE HTTP methods. These roughly translate to read, create, edit, and delete actions, respectively. Parameters are passed using HTTP request headers or query strings, and responses can be returned in a number of formats (XML and JSON being very common). Various HTTP error codes are used to indicate the state of the resource following the request.

The request body of a RESTful call to return a list of of user accounts in XML format may look like this:

GET /api/users HTTP/1.0
Accept: text/xml

The response may be a 200 OK status with an XML payload, or it might be a 401 Unauthorized if the API requires authentication but the credentials were not included with the request.

Creating a new user resource is also very easy, as shown below:

POST /api/users HTTP/1.0
Accept: text/xml
Content-Type: text/xml
Content-Length: 227

<?xml version="1.0" encoding="UTF-8"?>
<user>
    <firstName value="Example" />
    <lastName value="User" />
    <displayName value="Example User" />
    <passwordHash value="5f4dcc3b5aa765d61d8327deb882cf99h238" />
</user>

If the new user resource was created successfully, the server may respond with a 201 Created status, and a Location header pointing to the newly created resource.

201 Created
Location: http://example.org/api/users/31337

A request to the URI provided in the Location header should return a payload containing the new user record.

Conclusion

Hopefully at this point the basic differences between the two API types are a little clearer. In my opinion, one interface type is not inherently superior to the other, and both can be used very effectively given proper implementations (and documentation!).

While I centered my examples here around a fictional user API which translates well into both interface types, I think it's important to also point out that there are some functions an API might have to support that don't work nearly as well. One such example might be an API which is used to perform some sort of action, such as processing a payment. This type of action would work very well in an RPC interface:

http://example.com/api/processPayment \
    ?format=xml \
    &amount=313.37 \
    &cardNo=4111111111111111

If you know how to do this in a RESTful way, please share! ;-)

Sunday, July 18, 2010

Subversion Branches & You

The topic of managing a subversion repository comes up a lot around the office. The development team where I work is still getting used to the idea of branching, and with that come the ideas of merging, rebasing, and reintegrating. I figured I'd go over some of these concepts for anyone else who may be searching.

The typical layout of a subversion repository is to have three directories (sometimes called resources) under the root: trunk, branches, and tags. These directories serve to hold the mainline development, experimental or incomplete development, and release code, respectively. While most everybody who has used subversion is familiar with trunk, some are not clear on the purpose of the other two.

To fully explain branches and tags, I'd like to start first with trunk. When it comes to source control, I operate under a simple philosophy: Trunk is always stable.

This means that under source control, the trunk resource should always be in a working state. A user or developer should be able to check out the contents of trunk and work with it right away. However, stable in this context does not mean that APIs or functionality can't change from revision to revision; it simply means that the code base should work without breaking. This is an important distinction.

Of course, leaving trunk in this state while working on large change sets and experimental updates conflicts with another established philosophy: Commit early, commit often. So, where do these changes go? Branches, of course!

Branches are basically temporary copies of trunk where a developer can work and commit and break as they like. Once the changes to a branch are stable, the branch can be reintegrated to trunk, and the branch deleted. This keeps trunk clean and stable, while allowing the developer to retain the benefits of revision control. A project can have as many unique branches as the developer or team likes. Branch names don't really have any rules, but most teams have some sort of preferred convention.

When a project is ready for a release, a special type of branch is usually created, called a tag. A tag can be a straight copy of trunk, serving as a marker in the project's revision timeline, or it can be a modified version of the code base derived from a build system. Tags are usually named for the release version they represent. Since tags represent a fixed release, they are not supposed to be modified once they are created. While subversion itself has no concept of tags (it treats them like any other branch), the established convention is that they are permanent and users do not expect them to change. Most third-party repository management tools will warn you if you are about to modify a tag.

A typical active subversion repository may look like this:
/
branches/
rchouinards_branch/
feature_195/
foo/
tags/
1.0/
1.1/
1.2/
trunk/

Let's say a developer - we'll call him Bob - is working on the project shown above. Bob is assigned Bug #209, and after reviewing it decides that the changes required to fix the bug warrant creating a branch. Bob creates his new branch from trunk, naming it bug_209, and switching his working copy to it. As Bob is working on his ticket, other developers are busy committing to their branches, and some are even committing to trunk. After a while, Bob decides he needs to check his code against the current trunk. To do so, Bob would merge the changes from trunk to his working copy, and if everything looks good, commit those changes back in to his branch. Bob has rebased his branch from trunk.

A little while later, Bob decides that his work is done, and the bug is fixed as confirmed by his regression tests. Bob switches his working copy to trunk, and merges the changes from his branch to his working copy. One last successful run of the test suite later, Bob commits his working copy into trunk, deletes his branch, and marks Bug #209 as ready to deploy. Bob has just reintegrated his branch into trunk.

Bob also happens to be the release master for his team, which makes him responsible for creating tags. Since Bug #209 was prioritized critical, he needs to push the fix into production as soon as possible. Bob uses the team's build system to make sure trunk is stable and ready to deploy, and then creates a release tag from the build output, which he names 1.2.1. Bob then uses the team's deployment tools to verify the tag and push the code out into production. Hooray!

In Bob's case, the fact that his team uses branches and tags efficiently allowed him to easily deploy the application into a production environment. Hopefully, you have a little better understanding as to what branches are, and how a good branching strategy comes into play during the testing and deployment phase of development.

I'm still working on a post that will describe a bit more of the magic that happens when Bob runs his build and deployment tools. :-)

Saturday, July 10, 2010

PHP Application Lifecycle: Unit Testing

The concept of unit testing is nothing new, but unfortunately it seems to still be rare among PHP developers. I believe it's not because developers don't think testing is a good idea, but instead that they think testing is hard, or makes development times longer. I actually used to be one of those developers.

I think that most developers would agree that testing is a good thing, and we should all be doing it. Some developers like to test their work simply by calling it in another bit of code (or reloading the page in browser) and observing the results. The problem with this approach is that it is inflexible. While it may work fine on smaller bits of code, larger classes and objects that interact with other objects may not be as easy to test. Most of the time, these quick one-off tests assume perfect conditions, which isn't always the case.

By using a testing framework, a developer can quickly build test cases for a bit of code, and run those tests as they continue to make changes to the code in order to make sure nothing gets broken. In fact, this type of testing is called regression testing, and is only one type of test a developer can create. The most common types of tests are:
Smoke Test
The first, simple test against a new bit of code. These are used to check the code for expected behavior with valid input.
Regression Test
A set of tests written to verify and fix specific bugs or usage scenarios. For example, if a method expects a string, and causes a bug if given an integer, a regression test should be written - which fails - to verify the presence of the problem. The code should then be fixed to make the test pass. Regression tests are then used to make sure a bug is not re-introduced in future code revisions.
Integration Test
More advanced testing which checks the interaction between two or more portions of code. An integration test might be written to make sure a library is properly writing data to the database.
Behavior Test
Another more advanced testing methodology in which the test isn't concerned so much with the result, but how the code works internally. If a bit of code is expected to log data to a file, a behavior test will call that bit of code, and watch for the proper call to the log method.

PHP has two main unit testing tools: SimpleTest by Marcus Baker, and PHPUnit by Sebastian Bergmann. SimpleTest's Website hasn't been updated in a while, and I'm not sure of the state of the tool. PHPUnit is the most widely accepted, and is compatible with the xUnit family of testing tools. I use and will focus on PHPUnit for this discussion. PHPUnit supports all the test types outlined above, but for brevity I'm only going to review a simple smoke test.

Let's assume a simple class which provides a few math-based methods. It may look like this:
<?php
class Calculator
{

public function add($first, $second)
{
return (int) $first + (int) $second;
}

public function subtract($first, $second)
{
return (int) $first - (int) $second;
}

public function multiply($first, $second)
{
return (int) $first * (int) $second;
}

public function divide($first, $second)
{
return (int) $first / (int) $second;
}

}

A quick one-off test for this may look like this:
<?php
$calc = new Calculator;

echo "add(): ";
// Should output "4"
echo $calc->add(2, 2);
echo PHP_EOL;

echo "subtract(): ";
// Should output "2"
echo $calc->subtract(4, 2);
echo PHP_EOL;

echo "multiply(): ";
// Should output "10"
echo $calc->multiply(5, 2);
echo PHP_EOL;

echo "divide(): ";
// Should output "5"
echo $calc->divide(10, 2);
echo PHP_EOL;

Output would look like this:
[rchouinard@beta ~]$ php testCalc.php
add(): 4
subtract(): 2
multiply(): 10
divide(): 5

This approach seems simple, but some problems become apparent as development on the Calculator class continues. For starters, the test script doesn't really indicate what the test is checking for. The person invoking the script must know what output is expected in order to tell if the test passed or failed. This test script can be rewritten as a PHPUnit test case very easily:
<?php
require_once 'Calculator.php';
require_once 'PHPUnit\Framework\TestCase.php';

class CalculatorTest extends PHPUnit_Framework_TestCase
{

private $calc;

protected function setUp ()
{
parent::setUp();
$this->calc = new Calculator;
}

protected function tearDown ()
{
$this->calc = null;
parent::tearDown();
}

public function testAdd ()
{
$this->assertEquals(4, $this->calc->add(2, 2));
}

public function testSubtract ()
{
$this->assertEquals(2, $this->calc->subtract(4, 2));
}

public function testMultiply ()
{
$this->assertEquals(10, $this->calc->multiply(5, 2));
}

public function testDivide ()
{
$this->assertEquals(5, $this->calc->divide(10, 2));
}
}

Running PHPUnit against this file gives us easy to read and understand output:
[rchouinard@beta ~]$ phpunit CalculatorTest.php
PHPUnit 3.5.0beta1 by Sebastian Bergmann.

....

Time: 0 second, Memory: 1.00Mb

OK (4 tests, 4 assertions)

If one of the assertions had failed, we would get output like this:
[rchouinard@beta ~]$ phpunit CalculatorTest.php
PHPUnit 3.5.0beta1 by Sebastian Bergmann.

.F..

Time: 0 seconds, Memory: 1.00Mb

There was 1 failure:

1) Calculator::testSubtract
Failed asserting that matches expected .

/home/rchouinard/working/CalculatorTest.php:29

FAILURES!
Tests: 4, Assertions: 4, Failures: 1.

Hopefully some of the benefits of a testing framework are apparent now. Our test code doesn't have to deal with output, and we have immediate pass/fail feedback without having to know what values the test is expecting. PHPUnit even tells us exactly what went wrong and caused the test to fail.

This has been a very simple intro to PHPUnit, and doesn't even begin to scratch the surface of what PHPUnit is capable of. I would encourage you to take a look at the PHPUnit documentation to learn more. For a working example of a PHPUnit setup, take a look at my PHP component library.

In coming posts, I'll discuss integrating PHPUnit into other tools for some truly powerful code analysis.

Wednesday, July 7, 2010

PHP Application Lifecycle: Build vs. Deploy

Over the past few days, I've been playing with Phing, a build tool for PHP, similar to Ant for Java. I initially intended to use Phing to deploy a Web application I'm developing, but I've come to realize its power as a build tool while working on a set of PHP libraries as well. Through my time with the tool, I've realized the important distinction between a build tool, such as Phing, and a deployment tool.

Many articles and tutorials around the Internet tend to focus on using Phing for three things:
  • Kicking off automated tests with PHPUnit
  • Building API documentation with PHPDocumentor
  • Deploying code to a Web server
While Phing is definitely suited for these tasks, and indeed comes with many built-in tasks for these exact purposes, it can be used — and is intended to be used — to do so much more.

In the PHP world, where code is not compiled, the primary purpose of a build tool is to prepare a code base for distribution. This usually means changing configuration files, cleaning out development-only artifacts, and packaging the code base in a neat little tarball, PEAR package, or other archive. While Phing and other build tools often have built-in support for simple deployments via rsync, scp, and other file transport mechanisms, they typically don't support truly robust deployment features.

In many environments, especially those with only a single Web server, the simple mechanisms provided by the build tools may be just fine. However, in more complex environments, these methods quickly break down, and deployment should be handled by a dedicated utility. That's not to say that Phing doesn't belong in these environments; to the contrary, the real power of Phing can shine through the most here. By taking the code base from its development environment, transforming it, testing it, and packing it, a build tool can pass off the prepared code to the deployment system with a higher level of code confidence than before.

Some time in the future, I'll discuss a little about how I'm managing my code with Phing, my deployment solutions, and introduce my other new friend, the Hudson continuous integration server.