Home / Section index
 www.icosaedro.it 

 Worth the effort? Testing PHPLint in production

Last updated: 2017-02-06

This is a short article about my experiences with PHPLint applied to an existing software, so you may figure out what you may expect when applied in a production environment. Some basic knowledge of PHPLint and its features is assumed, especially: type model, error handling, checked exceptions and unchecked exceptions.

There are basically two types of source programs:

  1. Libraries or collection of functions and classes.
  2. Simple web pages that use the tools above.

Case 1: Libraries, the hardest part

Case 1 is the most difficult part of the work. Most of the existing software libraries, even those that look more accurate, always require deep changes in order to pass validation. This initial phase might be quite discouraging at first, with PHPLint complaining with hundreds of "errors" on a otherwise perfectly working piece of software. The picture below illustrates what happen with the popular fpdf library just downloaded. The vertical bar to the right is completely red ("errors") with some orange lines ("warnings"):


FPDF first pass on PHPLint: ouch!


It took to me a couple of hours to get rid of 90% of those errors. Some are really simple to fix. For example, numbers must start with a digit, so .5 must be rewritten 0.5. The pseudo-code statement /*. require_module 'standard'; .*/ is pretty mandatory in any source code that uses the standard functions of PHP. Other common problems one may encounter are the dynamically calculated path of the required packages:

require_once dirname(__FILE__) . '/include/xxx.php';

that can be easily replaced with the text editor on all the source to a statically evaluated path PHPLint can understand using the magic constant __DIR__ available since PHP 5.3.0 (June 2009!):

require_once __DIR__ . '/include/xxx.php';

Some line-tags in DocBlocks are not supported, like @class (redundant), @protected, @private, @public, @static (PHP 5 already has its own keywords for these), which are useless anyway for both PHPLint and phpdocumentor and can be removed. This action alone gets rid of most of the errors.

Several "variable variable" like $this->$mm must be replaced in some way, generally using objects rather than variable names.

Often the exact type of the properties is not specified, and even where specified it is a generic "array" or "object", which simply does not mean anything. Here one must figure out what these types really might be guessing from the comments around (when available) or guessing from their usage, or simply by trying and error.

Another step is sorting functions and methods in bottom-up order to fix the boring "undefined function" error, as PHPLint requires them to be ordered that way being a single pass parser (programmers mostly do the reverse). This may take a bit more time and might be quite boring. I agree that this is a limitation of the current version of PHPLint, but nothing is perfect in this world.

At that point there might remain only about 10% of the initial errors, typically:

- Missing, incomplete or invalid @param and @return, often with fictional, never heard before types, or simply wrong:

@param $i Number.   missing type, it must be "@param TYPE $V Comments."
@param $p int       type and var name order reversed
@param array $a     array with obscure content
@param number $n    "number" undefined; if both "int" and "double" allowed, just set "double"
@return strings     what is it? maybe an array of strings?
@return array       again, array of what?
@return             "return" what? if nothing, just write "@return void" or nothing at all

- For most PHP programmers all scalar types are just the same thing, so "string", "int" and "double" are used freely. Here it might be hard to figure out what the program is really doing with these data and at first I have to guess by try and error.

- Even harder to fix are the associative arrays used as generic containers of data. Sometimes a private class has to be defined to replace them.

- Final step, fixing missing error handling on functions that may trigger error or throw exception. Most of the existing software simply ignores these issues, and runs crunching data blindly even if an error occurred. It's here where PHPLint paranoia about error handling starts to pay off allowing to add safety to the program.

- Type-consistency, flow analysis, error and exception propagation control, may reveal bugs never discovered before but still present even in the most popular software around.

Doing all this requires time. Much time. Fixing the FPDF class, for example, took to me some days, and the result is now available along with PHPLint. You may look at its source now (here and click on the SOURCE link at top-right) and compare with the same original version. You may notice some private class at the very beginning of the source: these private classes replace some otherwise obscure associative array whose contents where hard to figure out and that was impossible to validate.

Now the source of the FPDF class is fully documented, passes the PHPLint validation, all types declared matches their use, errors are either handled or throw exception. Worth the effort? I think definitely yes.

2. Simple WEB pages

Once fixed the common libraries of the web application, taking care of the web pages is usually much faster and simpler. Basically, these pages only require some library and generate a web page. No much to do here. The only boring thing are all those $_GET and $_POST which require some validation and type-cast; better to replace them with some handy library function that always returns a valid value if available, or returns some default value if defined, or finally throw exception if no default is defined. Here it is an example of how to retrieve an hidden field from post-back:

$n = $_POST['n'];
replace with
$n = getPostInt('n', 0); // optional 'n', defaults to 0

$i = $_POST['i'];
replace with
$i = getPostInt('i');  // mandatory 'i', throws unchecked
                       // exception if missing

/**
 * Returns integer parameter from post-back (no user data validation).
 * @param string $name Name of the POST parameter.
 * @param int $default_ Default value if parameter missing.
 * @return int Value of the parameter, or the default if missing.
 * @throws RuntimeException Missing parameter and no default specified.
 */
function getPostInt($name, $default_ = 0)
{
	if( isset($_POST[$name]) )
		return (int) $_POST[$name];
	else if( func_num_args() == 2 )
		return $default_;
	else
		throw new RuntimeException("missing '$name' in POST");
}

User data validation may require a bit more code, which depends on the application.

The others minor issues on a simple web page are all the variables used but not initialized, that can be addressed easily initializing them to some value at the beginning of the page.

Once all this has been done, you may finally set the error detection to its maximum level for the whole application either in php.ini or at runtime:

	error_reporting(PHP_INT_MAX);

and still getting the log file of the web server empty, allowing to read on a regular daily base the error log of the server to spot for real issues that passed unnoticed before. This is definitely the real added value of a well written and validated PHP web application: safety (the program either does its job or fails) and awareness (you may know if something went wrong).

If it worths the effort depends only on you and your customer choice.

Error handling

Both libraries and web pages have to do with errors and errors handling. Most PHP programs simply ignore the issue, and run blindly generating random data and logging desperate error messages in the error log file nobody reads anyway.

The safer and simpler strategy here is mapping all the errors (and notices as well) to exceptions using the powerful set_exception_handler() function. To this purpose the PHPLint standard library provides the "magic" errors.php module you may use in your libraries and programs, so that any error, any E_WARNING, and even any E_NOTICE is turned into a fatal ErrorException.

It might look too drastic at first, and most PHP programmers might be against it, but with a bit of discipline and regular use of PHPLint, soon it becomes impossible to do without this feature because it makes programs simpler and safer.

The only drawback is that ErrorException is "checked" under PHPLint. This means it must either be caught with the try/catch statement, or declared as being thrown in the DocBlock. For example:

# Turns errors into ErrorException:
require_once __DIR__ . "/errors.php";

# No handling at all, fatal on error but safe:
$f = fopen("data.txt", "r");

# Fully handling of exception:
try {
	$f = fopen("data.txt", "r");
	$line = fgets($f);
	fclose($f);
	echo "read line $line";
}
catch(ErrorException $e){
	error_log("$e");
}

/**
 * @throws ErrorException
 */
function readData()
{
	$f = fopen("data.txt", "r");
	$line = fgets($f);
	fclose($f);
}

In the code above, note how errors are handled in the function readData(): they are not handled at all! The DocBlock simply states the function may throw exception and the client software has to take care to handle this exception either capturing it (again, with the try/catch statement) or propagating it (again, by declaring this exception as thrown in its DocBlock). Declaring thrown exceptions not only makes evident where a program may fail, but it makes also simpler to write safe and well documented libraries.

Programs and web pages where these functions are used have then a choice: capturing the exception and doing something with it, or terminate more or less abruptly.

Programs may simply ignore checked exceptions and leave the program to die and exit with error status. Web pages may either leave the server to log the exception generating an internal server error code 500, or log the error and redirect the user to some alternative safe page before emitting any HTML code.


Umberto Salsi
Contact
Site map
Home / Section index