Monday, April 23, 2007

Programming Style, Naming Conventions

Good naming can make a huge difference in program readability. Names like proc1, proc2, and proc3 mean next-to-nothing, but even apparently decent names can be ambiguous. For instance, to name a function that takes two ranges and computes the amount of the first that lies within the second, names like computeRangeWithinRange might sound reasonable. Unfortunately, this name gives no information about the order of arguments to the function. Sometimes this won't be a problem because an intelligent IDE should help you determine the names of the arguments to the function, but it can be confusing when the code is printed, or when you try to quickly read through the code. A better name might be computeRangeWithinFirstRange, which at least gives a sense of the order of the arguments to the function.
General Naming ConventionsIt's usually best to choose a consistent set of naming convnetions for use throughout your code. Naming conventions usually govern things such as how you capitalize your variables, classes, and functions, whether you include a prefix for pointers, static data, or global data, and how you indicate that something is a private field of a class. There are a lot of common naming conventions for classes, functions and objects. Usually these are broken into several broad categories: c-style naming, camelCase, and CamelCase. C-style naming separates words in a name using underscores: this_is_an_identifer. There are two forms of camelCase: one that begins with a lowercase letter and then capitalizes the first letter of every ensuing word, and one that capitalizes the first letter of every single word. One popular convention is that leading capital letter CamelCase is used for the names of structs and classes, while normal camelCase is used for the names of functions and variables (although sometimes variables are written in c-style to make the visual separation between functions and variables more clear). It can be useful to use prefixes for certain types of data to remind you what they are: for instance, if you have a pointer, prefixing it with "p_" tells you that it's a pointer. If you see an assignment between a variable starting with "p_" and one that doesn't begin with "p_", then you immediately know that something fishy is going on. It can also be useful to use a prefix for global or static variables because each of these has a different behavior than a normal local variable. In the case of global variables, it is especially useful to use a prefix in order to prevent naming collisions with local variables (which can lead to confusion). Finally, a common convention is to prefix the private fields and methods of a class with an underscore: e.g., _private_data. This can make it easier to find out where to look in the body of a class for the declaration of a method, and it also helps keep straight what you should and should not do with a variable. For instance, a common rule is to avoid returning non-const references to fields of a class from functions that are more public than the field. For instance, if _age is a private field, then the public getAge function probably shouldn't return a non-const reference since doing so effectively grants write access to the field!
Hungarian NotationHungarian notation has commonly been associated with prefixing variables with information about their type--for instance, whether a variable is an integer or a double. This is usually not a useful thing to do because your IDE will tell you the type of a variable, and it can lead to bizarre and complicated looking names. The original idea behind Hungarian notation, however, was more general and useful: to create more abstract "types" that describe how the variable is used rather than how the variable is represented. This can be useful for keeping pointers and integers from intermixing, but it can also be a powerful technique for helping to separate concepts that are often used together, but that should not be mixed.
AbbrevationsAbbreviations are dangerous--vowels are useful and can speed up code reading. Resorting to abbreviations can be useful when the name itself is extremely long because names that are too long can be as hard to read as names that are too short. When possible, be consistent about using particular abbreviations, and restrict yourself to using only a small number of them. Common abbreviations include "itr" for "iterator" or "ptr" for pointer. Even names like i, j, and k are perfectly fine for loop counter variables (primarily because they are so common). Bad abbreviations include things like cmptRngFrmRng, which at the savings of only a few letters eliminates a great deal of readability. If you don't like typing long names, look into the auto-complete facilities of your text editor. You should rarely need to type out a full identifier. (In fact, you rarely want to do this: typos can be incredibly hard to spot.)

No comments: