Sunday, April 1, 2007

Debugging Strategies, Tips, and Gotchas

Use the Right ToolsIt should go without saying that you should always be using the best tools available; if you're hunting a segmentation fault, you want use a debugger. Anything less than that is unnecessary pain. If you're dealing with bizarre memory issues (or hard-to-diagnose segfaults), use Valgrind on Linux or Purify for Windows.
Debug the ProblemMy first instinct when debugging is to ask, "is my code too complicated?" Sometimes we'll all come up with a solution to a problem only to realize that the solution is really hard to get working. So hard, in fact, that it might be easier to solve the original problem in another way. When I see someone struggling to debug a complex mass of code, my first thought is to ask whether there's a cleaner solution. Often, once you've written bad code, you have a much better idea of what the good code should look like. Remember that just because you've written it doesn't mean you should keep it! The trick is always to decide if you're trying to solve the original problem or to solve a particular choice of solution. If it's the solution, then it's possible that your problems don't stem from the problem at all--maybe you're over-thinking the problem or trying a wrong-headed approach. For instance, I recently needed to parse a file and import some of the data into an access database to prototype an analysis tool. My first instinct was to write a Ruby script that interfaced directly with Access and inserted all of the data into the database using SQL queries. As I looked at the support for doing this in Ruby, I quickly realized that my "solution" to the problem was going to take a lot longer than the problem should have taken. I reversed course, wrote a script that just output a comma-separated value file, and had my data fully imported in about an hour.
An Aside on Bad CodePeople are often reluctant to throw out bad code that they've written and re-write it. One reason is that code that's written feels like completed work, and throwing it out feels like going backward. But when you're debugging, rewriting the code can seem more appealing because you're probably saving yourself time spent debugging by spending a bit more time coding. The trick is to avoid throwing out the baby with the bath water--remove the bad code, don't start the whole program over again (unless it's rotten to the core). Rewrite only the parts that really need it. Your second draft will probably be both cleaner and less buggy than the first, and you may avoid issues like having to go back later and rewrite the code just so that you can figure out how it was supposed to work. On the other hand, when you're absolutely sure that code that looks horrible is the right code to use, you'll want to explain your rationale in a comment so someone (or you) doesn't come back later and hack it apart.
Minimize Potential Problems by Avoiding Copy-Paste SyndromeNothing is more frustrating than to realize that you're debugging the same problem multiple times. Whenever you copy and paste large chunks of code, you leave yourself open to the unknown demons inhabiting that code. If you haven't debugged it yet, odds are that you're going to have to. And if you forgot that you copied that code somewhere else, you're probably going to be debugging the same code more than once. (There are other reasons to avoid copy-paste syndrome; even worse than debugging the same code twice is finding the bug in only one piece of copy-pasted code.) The best way to avoid copy-paste syndrome is to use functions to encapsulate as much of your repeat code as possible. Some things can't easily be avoided in C++; you're going to write a lot of for loops no matter what you're doing, so you can't abstract away the whole looping process. But if you have the same loop body in multiple places, that might be a sign that it should be pulled into a separate function. As a bonus, this makes other future changes to your code easier and allows you to reuse the function without having to find a chunk of code to copy.
When to Copy CodeAlthough copying code is usually dangerous, there are times when it may be the best choice. For instance, if you need to make small, irregular tweaks to a chunk of code, but the bulk of it needs to remain the same, then copying, pasting, and careful editing might make sense. By copying the code, you avoid the chance that you introduce new bugs by mistyping the code. It should go without saying that you should have carefully debugged the code you plan to copy before you do so! (But I said it, and I'm not even paid by the word.) The second reason to copy code is when you have long variable names and a bad text editor. The best solution is generally to get a better text editor with keyword completion.
Make Big Problems Found Late Small Problems Found Early
Testing EarlyOne advantage of pulling out code and putting it into functions is that you can then separately test those functions. This means that you can sometimes avoid debugging big problems caused by simple bugs in the original functions. Nothing is more frustrating than writing perfectly correct code given how you thought a function (or a class) worked, only to find out that it doesn't work that way. This kind of unit testing requires some discipline and a good sense of what can go wrong with your code. Another advantage of early testing--especially if you write some or all of your tests up-front, before the code--is that you'll pay more attention to the specific interface to your class. If you can't test error handling because you're using an assert instead of an exception or error code, that might be an indication that you should be using some form of error reporting rather than asserts. (Of course, this won't always be the case--there are times when you just want to verify that your asserts work correctly.) Beyond error-reporting, writing tests is the first time you can test your code's interface, which is often as valuable as testing that the code works. If the interface to your class is clunky, or your functions have impossible-to-understand, let alone remember, argument lists, it might be time to rethink what you're doing before you write the underlying code.
Compiler WarningsMany potential bugs can be caught by your compiler. Some such errors include using uninitialized variables, accidentally replacing a check for equality with an assignment in a conditional, or, in C++, errors related to mixing types such as pointers and ints. Since this has been covered before, I suggest checking out the article why you should pay attention to compiler warnings.
Printf LiesBecause I/O is usually buffered by the operating system, using printfs in your debugging process is risky. When possible, use a debugger to figure out what lines of code are the problem rather than narrowing in on the issue with code littered by printfs and cout. (And beware the stray printf that slips in during debugging and, ahem, slips into the final version.)
Flush OutputNevertheless, there are times when you actually need to keep track of some state in a log file--perhaps you simply have too much data that you need to collect, and you need the data from program start-up to the moment the bug occurs. To ensure you collect all of the data, be sure to flush it: you can use fflush in C, or output an endl in C++. fflush takes the FILE pointer you are writing into; for instance, to flush stderr, you would write fflush( stderr);
Check Your Helper FunctionsThis should be obvious, but it's easy to forget in the heat of the moment: always verify that your helper functions work, especially when seemingly simple code is failing. When possible, isolate each helper function and test it individually, then test each of its helper functions. There's nothing more frustrating than realizing that your original logic was right, but your assumption about a helper function was wrong.
When Cause Doesn't Immediately Lead to EffectEven if a helper function doesn't seem to be the immediate source of a problem, its side effects may cause latent problems. For instance, if you have a helper function that can return NULL and you pass its output into a library function dealing with C-strings, you may see the immediate cause as dereferencing a NULL pointer in strcat, but the real cause was the buggy function you wrote earlier (or the fact that you didn't check for NULL after calling it).
Remember That Code May Be Used in More Than One PlaceAnother problem that can come up when debugging is that you discover the problem appears to be the result of a particular function call, set a break point inside that function, and then discover that there are hundreds of calls to the same function throughout the code. Or worse, you don't notice this until wasting hours of time trying to figure out what's going on or thinking that the reason for the problem is that the function is being called incorrectly. (When, in fact, it's being called correctly but with different arguments than the point at which the bug occurred.) The most obvious solution is to check the call stack after hitting a break point or to set the breakpoint right before the call that is actually the problem. Unfortunately, this doesn't always help if the same call works thousands of times but fails on the 1001st call. Potential solutions include counting the number of calls to a function and then stepping through that many breakpoints set inside the function, or using a static variable as a counter.

No comments: