Sunday, April 22, 2007

Writing for Readability

There are a lot of ways to solve the same problem in C or C++। This is both good and bad; it is good because you have flexibility. It's also bad because you have flexibility--the flexibility to choose different solutions to the same problem when it shows up in different places. This is confusing because it obscures the underlying similarity between the problems.
Using FunctionsUnlike prose, where repeating the same word or phrase may seem redundant, in programming, it's perfectly fine to use the same construction over and over again. Of course, you may want to turn a repeated chunk of code into a function: this is even more readable because it gives the block of code a descriptive name. (At least you ought to make it descriptive!) You can also increase readability by using standard functions and data structures (such as the STL). Doing so avoids the confusion of someone who might ask, "why did you create a new function when you had a perfectly good one already available?" The problem is that people may assume that there's a reason for the new function and that it somehow differs from the standard version. Moreover, by using standard functions you help your reader understand the names of the arguments to the function. There's much less need to look at the function prototype to see what the arguments mean, or their order, or whether some arguments have default values.
Use Appropriate Language FeaturesThere are some obvious things to avoid: don't use a loop as though it were an if statement. Choose the right data type for your data: if you never need decimal places in a number, use an integer. If you mean for a value to be unsigned, used an unsigned number. When you want to indicate that a value should never change, use const to make it so. Try to avoid uncommon constructions unless you have good reason to use them; put another way, don't use a feature just because the feature exists. One rule of thumb is to avoid do-while loops unless you absolutely need one. People aren't generally as used to seeing them and, in theory, won't process them as well. I've never run into this problem myself, but think carefully about whether you actually need a do-while loop. Similarly, although the ternary operator is a great way of expressing some ideas, it can also be confusing for programmers who don't use it very often. A good rule of thumb is to use it only when necessary (for instance, in the initialization list of a constructor) and stick with the more standard if-else construction for everything else. Sure, it'll make your program four lines longer, but it'll make it that much easier for most people to read. There are some less obvious ways of using standard features. When you are looping, choose carefully between while, do-while, and for. For loops are best when you can fill in each part (initialization, conditional, and increment) with a fairly short expression. While loops are good for watching a sentinel variable whose value can be set in multiple places or whose value depends on some external event such as a network event. While loops are also better when the update step isn't really a direct "update" to the control variable--for instance, when reading lines from a text file, it might more sense to use a while loop than a for loop because the control depends on the result of the method call, not the value of the variable of interest: while (fgets(buf, sizeof(buf), fp) != NULL)
{
/* do stuff with buf */
}
It wouldn't make sense to write this sort of thing as a for loop. (Try it!)
Unpack Complex ExpressionsThere's no reason to put everything on a single line. If you have a complex calculation with multiple steps and levels of parentheses, it can be extremely helpful to go from a one-line calculation to one that uses temporary variables. This gives you two advantages; first, it makes it easier to follow the expression. Second, you can give a distinct name to each intermediate step, which can help the reader follow what is happening. Often, you'll want to reuse those intermediate calcuations anyway. In addition to mathematical calculations, this principle also applies to nested function calls. The fewer events that take place on a single line of code, the easier it is to follow exactly what's happening. Another advantage to unpacking an expression is that you can put more comments in-line to explain what's going on and why.
Avoid Magic NumbersMagic numbers are numbers that appear directly in the code without an obvious reason. For instance, what does the number 80 in the following expression mean? for( int i = 0; i < 80; ++i )
{
printf( "-" );
}
It might be the width of the screen, but it might also be the width of a map whose wall is being drawn. You just don't know. The best solution is to use macros, in C, or constants in C++. This gives you the chance to descriptively name your numbers. Doing so also makes it easier to spot the use of a particular number and differentiate between numbers with the same value that mean different things. Moreover, if you decide you need to change a value, you have a single point where you can make the change, rather than having to sift through your code.

No comments: