Monday, April 30, 2007

Binary Trees: Part 1

The binary tree is a fundamental data structure used in computer science. The binary tree is a useful data structure for rapidly storing sorted data and rapidly retrieving stored data. A binary tree is composed of parent nodes, or leaves, each of which stores data and also links to up to two other child nodes (leaves) which can be visualized spatially as below the first node with one placed to the left and with one placed to the right. It is the relationship between the leaves linked to and the linking leaf, also known as the parent node, which makes the binary tree such an efficient data structure. It is the leaf on the left which has a lesser key value (ie, the value used to search for a leaf in the tree), and it is the leaf on the right which has an equal or greater key value. As a result, the leaves on the farthest left of the tree have the lowest values, whereas the leaves on the right of the tree have the greatest values. More importantly, as each leaf connects to two other leaves, it is the beginning of a new, smaller, binary tree. Due to this nature, it is possible to easily access and insert data in a binary tree using search and insert functions recursively called on successive leaves.
The typical graphical representation of a binary tree is essentially that of an upside down tree. It begins with a root node, which contains the original key value. The root node has two child nodes; each child node might have its own child nodes. Ideally, the tree would be structured so that it is a perfectly balanced tree, with each node having the same number of child nodes to its left and to its right. A perfectly balanced tree allows for the fastest average insertion of data or retrieval of data. The worst case scenario is a tree in which each node only has one child node, so it becomes as if it were a linked list in terms of speed. The typical representation of a binary tree looks like the following:
10
/ \
6 14
/ \ / \
5 8 11 18The node storing the 10, represented here merely as 10, is the root node, linking to the left and right child nodes, with the left node storing a lower value than the parent node, and the node on the right storing a greater value than the parent node. Notice that if one removed the root node and the right child nodes, that the node storing the value 6 would be the equivalent a new, smaller, binary tree.The structure of a binary tree makes the insertion and search functions simple to implement using recursion. In fact, the two insertion and search functions are also both very similar. To insert data into a binary tree involves a function searching for an unused node in the proper position in the tree in which to insert the key value. The insert function is generally a recursive function that continues moving down the levels of a binary tree until there is an unused leaf in a position which follows the rules of placing nodes. The rules are that a lower value should be to the left of the node, and a greater or equal value should be to the right. Following the rules, an insert function should check each node to see if it is empty, if so, it would insert the data to be stored along with the key value (in most implementations, an empty node will simply be a NULL pointer from a parent node, so the function would also have to create the node). If the node is filled already, the insert function should check to see if the key value to be inserted is less than the key value of the current node, and if so, the insert function should be recursively called on the left child node, or if the key value to be inserted is greater than or equal to the key value of the current node the insert function should be recursively called on the right child node. The search function works along a similar fashion. It should check to see if the key value of the current node is the value to be searched. If not, it should check to see if the value to be searched for is less than the value of the node, in which case it should be recursively called on the left child node, or if it is greater than the value of the node, it should be recursively called on the right child node. Of course, it is also necessary to check to ensure that the left or right child node actually exists before calling the function on the node.Because binary trees have log (base 2) n layers, the average search time for a binary tree is log (base 2) n. To fill an entire binary tree, sorted, takes roughly log (base 2) n * n. Lets take a look at the necessary code for a simple implementation of a binary tree. First, it is necessary to have a struct, or class, defined as a node.struct node
{
int key_value;
struct node *left;
struct node *right;
};
The struct has the ability to store the key_value and contains the two child nodes which define the node as part of a tree. In fact, the node itself is very similar to the node in a linked list. A basic knowledge of the code for a linked list will be very helpful in understanding the techniques of binary trees. Essentially, pointers are necessary to allow the arbitrary creation of new nodes in the tree.There are several important operations on binary trees, including inserting elmeents, searching for elements, removing elements, and deleting the tree. We'll look at three of those four operations in this tutorial, leaving removing elements for later. We'll also need to keep track of the root node of the binary tree, which will give us access to the rest of the data: struct node *root = 0;
It is necessary to initialize root to 0 for the other functions to be able to recognize that the tree does not yet exist. The destroy_tree shown below which will actually free all of the nodes of in the tree stored under the node leaf: tree. void destroy_tree(struct node *leaf)
{
if( leaf != 0 )
{
destroy_tree(leaf->left);
destroy_tree(leaf->right);
free( leaf );
}
}The function destroy_tree goes to the bottom of each part of the tree, that is, searching while there is a non-null node, deletes that leaf, and then it works its way back up. The function deletes the leftmost node, then the right child node from the leftmost node's parent node, then it deletes the parent node, then works its way back to deleting the other child node of the parent of the node it just deleted, and it continues this deletion working its way up to the node of the tree upon which delete_tree was originally called. In the example tree above, the order of deletion of nodes would be 5 8 6 11 18 14 10. Note that it is necessary to delete all the child nodes to avoid wasting memory. The following insert function will create a new tree if necessary; it relies on pointers to pointers in order to handle the case of a non-existent tree (the root pointing to NULL). In particular, by taking a pointer to a pointer, it is possible to allocate memory if the root pointer is NULL. insert(int key, struct node **leaf)
{
if( *leaf == 0 )
{
*leaf = malloc( sizeof( struct node ) );
leaf->left->key_value = key;
/* initialize the children to null */
leaf->left->left = 0;
leaf->left->right = 0;
}
else if(key < (*leaf)->key_value)
{
insert( key, (*leaf)->left );
}
else if(key > (*leaf)->key_value)
{
insert( key, (*leaf)->left );
}
}The insert function searches, moving down the tree of children nodes, following the prescribed rules, left for a lower value to be inserted and right for a greater value, until it reaches a NULL node--an empty node--which it allocates memory for and initializes with the key value while setting the new node's child node pointers to NULL. After creating the new node, the insert function will no longer call itself. Note, also, that if the element is already in the tree, it will not be added twice. struct node *search(int key, struct node *leaf)
{
if( leaf != 0 )
{
if(key==leaf->key_value)
{
return leaf;
}
else if(keykey_value)
{
return search(key, leaf->left);
}
else
{
return search(key, leaf->right);
}
}
else return 0;
}The search function shown above recursively moves down the tree until it either reaches a node with a key value equal to the value for which the function is searching or until the function reaches an uninitialized node, meaning that the value being searched for is not stored in the binary tree. It returns a pointer to the node to the previous instance of the function which called it.

Sunday, April 29, 2007

Debugging Strategies, Tips, and Gotchas

Debugging can be tedious and painful if you don't set up your programs to help you debug them. In the spirit of "an apple a day keeps the doctor away", this article suggests approaches to writing code that's more debuggable, how to catch problems before they start, and gives you some time-wasting gotchas to watch out for and gives you some gotchas to watch out for.
Use the Right ToolsIt should go without saying that you should always be using the best tools available; if you're hunting a segmentation fault, you want use a debugger. Anything less than that is unnecessary pain. If you're dealing with bizarre memory issues (or hard-to-diagnose segfaults), use Valgrind on Linux or Purify for Windows.
Debug the ProblemMy first instinct when debugging is to ask, "is my code too complicated?" Sometimes we'll all come up with a solution to a problem only to realize that the solution is really hard to get working. So hard, in fact, that it might be easier to solve the original problem in another way. When I see someone struggling to debug a complex mass of code, my first thought is to ask whether there's a cleaner solution. Often, once you've written bad code, you have a much better idea of what the good code should look like. Remember that just because you've written it doesn't mean you should keep it! The trick is always to decide if you're trying to solve the original problem or to solve a particular choice of solution. If it's the solution, then it's possible that your problems don't stem from the problem at all--maybe you're over-thinking the problem or trying a wrong-headed approach. For instance, I recently needed to parse a file and import some of the data into an access database to prototype an analysis tool. My first instinct was to write a Ruby script that interfaced directly with Access and inserted all of the data into the database using SQL queries. As I looked at the support for doing this in Ruby, I quickly realized that my "solution" to the problem was going to take a lot longer than the problem should have taken. I reversed course, wrote a script that just output a comma-separated value file, and had my data fully imported in about an hour.
An Aside on Bad CodePeople are often reluctant to throw out bad code that they've written and re-write it. One reason is that code that's written feels like completed work, and throwing it out feels like going backward. But when you're debugging, rewriting the code can seem more appealing because you're probably saving yourself time spent debugging by spending a bit more time coding. The trick is to avoid throwing out the baby with the bath water--remove the bad code, don't start the whole program over again (unless it's rotten to the core). Rewrite only the parts that really need it. Your second draft will probably be both cleaner and less buggy than the first, and you may avoid issues like having to go back later and rewrite the code just so that you can figure out how it was supposed to work. On the other hand, when you're absolutely sure that code that looks horrible is the right code to use, you'll want to explain your rationale in a comment so someone (or you) doesn't come back later and hack it apart.
Minimize Potential Problems by Avoiding Copy-Paste SyndromeNothing is more frustrating than to realize that you're debugging the same problem multiple times. Whenever you copy and paste large chunks of code, you leave yourself open to the unknown demons inhabiting that code. If you haven't debugged it yet, odds are that you're going to have to. And if you forgot that you copied that code somewhere else, you're probably going to be debugging the same code more than once. (There are other reasons to avoid copy-paste syndrome; even worse than debugging the same code twice is finding the bug in only one piece of copy-pasted code.) The best way to avoid copy-paste syndrome is to use functions to encapsulate as much of your repeat code as possible. Some things can't easily be avoided in C++; you're going to write a lot of for loops no matter what you're doing, so you can't abstract away the whole looping process. But if you have the same loop body in multiple places, that might be a sign that it should be pulled into a separate function. As a bonus, this makes other future changes to your code easier and allows you to reuse the function without having to find a chunk of code to copy.
When to Copy CodeAlthough copying code is usually dangerous, there are times when it may be the best choice. For instance, if you need to make small, irregular tweaks to a chunk of code, but the bulk of it needs to remain the same, then copying, pasting, and careful editing might make sense. By copying the code, you avoid the chance that you introduce new bugs by mistyping the code. It should go without saying that you should have carefully debugged the code you plan to copy before you do so! (But I said it, and I'm not even paid by the word.) The second reason to copy code is when you have long variable names and a bad text editor. The best solution is generally to get a better text editor with keyword completion.
Make Big Problems Found Late Small Problems Found Early
Testing EarlyOne advantage of pulling out code and putting it into functions is that you can then separately test those functions. This means that you can sometimes avoid debugging big problems caused by simple bugs in the original functions. Nothing is more frustrating than writing perfectly correct code given how you thought a function (or a class) worked, only to find out that it doesn't work that way. This kind of unit testing requires some discipline and a good sense of what can go wrong with your code. Another advantage of early testing--especially if you write some or all of your tests up-front, before the code--is that you'll pay more attention to the specific interface to your class. If you can't test error handling because you're using an assert instead of an exception or error code, that might be an indication that you should be using some form of error reporting rather than asserts. (Of course, this won't always be the case--there are times when you just want to verify that your asserts work correctly.) Beyond error-reporting, writing tests is the first time you can test your code's interface, which is often as valuable as testing that the code works. If the interface to your class is clunky, or your functions have impossible-to-understand, let alone remember, argument lists, it might be time to rethink what you're doing before you write the underlying code.
Compiler WarningsMany potential bugs can be caught by your compiler. Some such errors include using uninitialized variables, accidentally replacing a check for equality with an assignment in a conditional, or, in C++, errors related to mixing types such as pointers and ints. Since this has been covered before, I suggest checking out the article why you should pay attention to compiler warnings.
Printf LiesBecause I/O is usually buffered by the operating system, using printfs in your debugging process is risky. When possible, use a debugger to figure out what lines of code are the problem rather than narrowing in on the issue with code littered by printfs and cout. (And beware the stray printf that slips in during debugging and, ahem, slips into the final version.)
Flush OutputNevertheless, there are times when you actually need to keep track of some state in a log file--perhaps you simply have too much data that you need to collect, and you need the data from program start-up to the moment the bug occurs. To ensure you collect all of the data, be sure to flush it: you can use fflush in C, or output an endl in C++. fflush takes the FILE pointer you are writing into; for instance, to flush stderr, you would write fflush( stderr);
Check Your Helper FunctionsThis should be obvious, but it's easy to forget in the heat of the moment: always verify that your helper functions work, especially when seemingly simple code is failing. When possible, isolate each helper function and test it individually, then test each of its helper functions. There's nothing more frustrating than realizing that your original logic was right, but your assumption about a helper function was wrong.
When Cause Doesn't Immediately Lead to EffectEven if a helper function doesn't seem to be the immediate source of a problem, its side effects may cause latent problems. For instance, if you have a helper function that can return NULL and you pass its output into a library function dealing with C-strings, you may see the immediate cause as dereferencing a NULL pointer in strcat, but the real cause was the buggy function you wrote earlier (or the fact that you didn't check for NULL after calling it).
Remember That Code May Be Used in More Than One PlaceAnother problem that can come up when debugging is that you discover the problem appears to be the result of a particular function call, set a break point inside that function, and then discover that there are hundreds of calls to the same function throughout the code. Or worse, you don't notice this until wasting hours of time trying to figure out what's going on or thinking that the reason for the problem is that the function is being called incorrectly. (When, in fact, it's being called correctly but with different arguments than the point at which the bug occurred.) The most obvious solution is to check the call stack after hitting a break point or to set the breakpoint right before the call that is actually the problem. Unfortunately, this doesn't always help if the same call works thousands of times but fails on the 1001st call. Potential solutions include counting the number of calls to a function and then stepping through that many breakpoints set inside the function, or using a static variable as a counter.

Saturday, April 28, 2007

Makefiles

Makefiles are something of an arcane topic--one joke goes that there is only one makefile in the world and that all other makefiles are merely extensions of it. I assure you, however, that this is not true; I have written my own makefiles from time to time. In this article, I'll explain exactly how you can do it too!
Understanding Make -- BackgroundIf you've used make before, you can safely skip this section, which contains a bit of background on using make. A makefile is simply a way of associating short names, called targets, with a series of commands to execute when the action is requested. For instance, a common makefile target is "clean," which generally performs actions that clean up after the compiler--removing object files and the resulting executable. Make, when invoked from the command line, reads a makefile for its configuration. If not specified by the user, make will default to reading the file "Makefile" in the current directory. Generally, make is either invoked alone, which results in the default target, or with an explicit target. (In all of the below examples, % will be used to indicate the prompt.) To execute the default target: % make
to execute a particular target, such as clean: % make clean
Besides giving you short build commands, make can check the timestamps on files and determine which ones need to be recompiled; we'll look at this in more detail in the section on targets and dependencies. Just be aware that by using make, you can considerably reduce the number of times you recompile.
Elements of a MakefileMost makefiles have at least two basic components: macros and target definitions. Macros are useful in the same way constants are: they allow you to quickly change major facets of your program that appear in multiple places. For instance, you can create a macro to substitute the name of your compiler. Then if you move from using gcc to another compiler, you can quickly change your builds with only a one-line change.
CommentsNote that it's possible to include comments in makefiles: simply preface a comment with a pound sign, #, and the rest of the line will be ignored.
MacrosMacros are written in a simple x=y form. For instance, to set your C compiler to gcc, you might write: CC=gcc
To actually convert a macro into its value in a target, you simply enclose it within $(): for instance, to convert CC into the name of the compiler: $(CC) a_source_file.c
might expand to gcc a_source_file.c
It is possible to specify one macro in terms of another; for instance, you could have a macro for the compiler options, OPT, and the compiler, CC, combined into a compile-command, COMP: COMP = $(CC) $(OPT)
There are some macros that are specified by default; you can list them by typing % make -p
For instance, CC defaults to the cc compiler. Note that any environment variables that you have set will be imported as macros into your makefile (and will override the defaults).
TargetsTargets are the heart of what a makefile does: they convert a command-line input into a series of actions. For instance, the "make clean" command tells make to execute the code that follows the "clean" target. Targets have three components: the name of the target, the dependencies of the target, and finally the actions associated with the target: target: [dependencies]


...
Note that each command must be proceeded by a tab (yes, a tab, not four, or eight, spaces). Be sure to prevent your text editor from expanding the tabs! The dependencies associated with a target are either other targets or files themselves. If they're files, then the target commands will only be executed if any of the dependent files have changed since the last time the command was executed. If the dependency is another target, then that target's commands will be evaluated in the same way. A simple command might have no dependencies if you want it to execute all the time. For example, "clean" might look like this: clean:
rm -f *.o core
On the other hand, if you have a command to compile a program, you probably want to make the compilation depend on the source files to compile. This might result in a makefile that looks like this: CC = gcc
FILES = in_one.c in_two.c
OUT_EXE = out_executable
build: $(FILES)
$(CC) -o $(OUT_EXE) $(FILES)
Now when you type "make build" if the dependencies in_one.c and in_two.c are older than the object files created, then make will reply that there is "nothing to be done." Note that this can be problematic if you leave out a dependency! If this were an issue, one option would be to include a target to force a rebuild. This would depend on both the "clean" target and the build target (in that order). The above sample file could be amended to include this: CC = gcc
FILES = in_one.c in_two.c
OUT_EXE = out_executable
build: $(FILES)
$(CC) -o $(OUT_EXE) $(FILES)
clean:
rm -f *.o core
rebuild: clean build
Now when rebuild is the target, make will first execute the commands associated with clean and then those associated with build.
When Targets FailWhen a target is executed, it returns a status based on whether or not it was successful--if a target fails, then make will not execute any targets that depend on it. For instance, in the above example, if "clean" fails, then rebuild will not execute the "build" target. Unfortunately, this might happen if there is no core file to remove. Fortunately, this problem can be solved easily enough by including a minus sign in front of the command whose status should be ignored: clean:
-rm -f *.o core
The Default TargetTyping "make" alone should generally result in some kind of reasonable behavior. When you type "make" without specifying a target in the corresponding makefile, it will simply execute the first target in the makefile. Note that in the above example, the "build" target was placed above the "clean" target--this is more reasonable (and intuitive) behavior than removing the results of a build when the user types "make"!
Reading Someone Else's MakefileI hope that this document is enough to get you started using simple makefiles that help to automate chores or maintain someone else's work. The trick to understanding makefiles is simply to understand all of your compiler's flags--much (though not all) of the crypticness associated with makefiles is simply that they use macros that strip some of the context from an otherwise comprehensible compiler command. Your compiler's documentation can help enormously here. The second thing to remember is that when you invoke make, it will expand all of the macros for you--just by running make, it's very easy to see exactly what it will be doing. This can be tremendously helpful in figuring out a cryptic command.

Thursday, April 26, 2007

21 Ways To Promote Your Website - Part Three by Neil Stafford

In this last installment of the series, we'll take a look at the remaining seven ways you can promote your website.
Please note that the 21 ways we've looked at are by no means exhaustive and should be used to help you decide which way(s) you prefer and are comfortable using.
With that thought, let's move on to the final seven.
15) Reciprocal Links
This involves the swapping of links with other websites usually within your niche market; however, I've seen many reciprocal links on sites that aren't related.
Reciprocal link exchanges were originally used to build up your link popularity rating with the search engines, although over the last few years this has become less effective as Search Engine technology advances.
However, don't discount reciprocal links just yet.
On several of our sites, we still seek out high ranking and high traffic websites where we can both benefit from a reciprocal link exchange.
In this case, I'm not doing it for the benefit of search engine ranking but for the pure reason of traffic generation.
A link in a prominent place on a high traffic site will, by its very nature, generate traffic for you. And if you have your site set up correctly you should be able to capture the visitors name and email address.
16) Search Engine Listings or SEO
Let's get one thing straight - I am not a search engine expert by any stretch of the imagination. In fact I heard the best definition of SEO earlier last year when it was called...
"Search Engine Optimist"
However, I do understand the principles and do put in place strategies to take advantage of the natural search listings.
The easiest way is to add content to your website and link this from an index page or site map on your website.
I run many websites that are single sales letter types sites and many of them have a page rank of 3, 4 or 5.
However, behind these pages are several dozen content pages that are linked via an index page. By checking my site statistics I can see which pages are driving traffic and in many cases new sales.
When setting up a new sales letter site, I'll use PPC traffic to gauge how well it will perform and for the ones that do really well I'll spend the time adding content pages to sit behind the main site.
17) Classified Adverts
I'm not talking about classified adverts online but simple adverts in your niche market's magazines and publications.
This simple strategy has made us thousands in various niche markets and has even had the magazine editorial team contact us to see if we would like to contribute to the magazine itself.
In specialist magazines you can often place a classified ad for only a few pounds or dollars and know that it will be reaching your target audience.
The idea of these adverts is not to sell your product off the page but to drive the readers to your website and preferably a name capture page.
To do this, your advert must contain a benefit to the reader to put down the magazine, go online and type in your web address.
18) Your Own Business Stationary
If you operate in many different niche markets and are only selling digital products then this may not be ideal for you, however, if you have a small number of markets then at a minimum I'd have business cards printed with your web address on them.
In our main niche markets, we have business cards that have an offer and call to action on the back to encourage people to visit our sites.
We also have letterheads, and if we send out a physical product, we enclose an insert with the branding and an offer or another call to action for the customer to take.
19) Email Campaigns
If people are already on your email list this shouldn't be the end of your traffic driving process. By making sub lists of your main list you can then send targeted messages to drive your customer and prospects to new sales pages and offers.
20) Forums
You may already know my view of forums within the Internet Marketing arena.
However, forums in niche markets are still an ideal way to drive traffic to your website. However, PLEASE don't go and blatantly promote your site on the forums...there are rules to follow.
First of all, find the forums in your niche and spend a bit of time 'lurking' reading the posts and watching how they are answered.
After a short time, you'll get the feel of the board and if they allow any promotion of websites or if you can use a signature file at the bottom of your post.
Once you understand the rules start answering some of the questions on the board, and leave your URL and/or signature. file at the end.
My own view with non marketing forums is not to try and answer all the questions; answer only a few and answer them completely with very good advice or suggestions.
This will get you noticed more and build up credibility with the other people on the board.
21) Conference Calls
Conference calls are an ideal way to build up an email list quickly and are an excellent relationship-building tool as well.
You can either have a free to join call or have attendees pay a fee to join you. Either way I'd approach other 'players' within your market and ask them would they promote your call for you. With a paid call you can offer an affiliate deal.
The call should be on a specific topic and during the call you can make reference to several pages on your site or make a specific sales offer for attendees.
With a free call you can make the MP3 recording available afterwards encouraging your attendees to tell others about it. This will create a viral effect as the call gets passed around and in turn drive traffic back to your website.
I have an mp3 that I recorded more than 3 years ago that still drives traffic to one of my sites each and every week!
Summary
So there you have it, the conclusion of '21 Ways To Promote Your Website'. Which ones will you implement into your business?
Running a successful web business is simple, but not easy. If it was easy everyone would be doing it.
However, it is simple...
What could be simpler than having a product that people are actively looking for and letting them know where they can get it from?
About the Author
Neil Stafford is Editor and Publisher of the Internet Marketing Review the UK's longest running PRINTED Internet Marketing Newsletter. 'Test drive' the Newsletter for FREE - Visit this special web page for more information: http://www.InternetMarketingReview.com/sya

Wednesday, April 25, 2007

Advanced Makefile Tricks

Special MacrosThere are some special macros that you can use when you want fine-grained control over behavior. These are macros whose values are set based on specifics of the target and its dependencies. All special macros begin with a dolllar sign and do not need to be surrounded by parentheses:
$@$@ is the name of the target. This allows you to easily write a generic action that can be used for multiple different targets that produce different output files. For example, the following two targets produce output files named client and server respectively. client: client.c
$(CC) client.c -o $@
server: server.c
$(CC) server.c -o $@
$?The $? macro stores the list of dependents more recent than the target (i.e., those that have changed since the last time make was invoked for the given target). We can use this to make the build commands from the above example even more general: client: client.c
$(CC) $? -o $@
server: server.c
$(CC) $? -o $@
$^$^ gives you all dependencies, regardless of whether they are more recent than the target. Duplicate names, however, will be removed. This might be useful if you produce transient output (such as displaying a result to the screen rather than saving it a a file). # print the source to the screen
viewsource: client.c server.c
less $^
$+$+ is like $^, but it keeps duplicates and gives you the entire list of dependencies in the order they appear. # print the source to the screen
viewsource: client.c server.c
less $+
$>If you only need the first dependency, then $> is for you. Using $> can be safer than relying on $^ when you have only a single dependency that needs to appear in the commands executed by the target. If you start by using $^ when you have a single dependency, if you then add a second, it may be problematic, whereas if you had used $> from the beginning, it will continue to work. (Of course, you may want to have all dependencies should up. Consider your needs carefully.)
Wildcard Matching in TargetsThe percent sign, %, can be used to perform wildcard matching to write more general targets; when a % appears in the dependencies list, it replaces the same string of text throughout the command in makefile target. If you wish to use the matched text in the target itself, use the special variable $*. For instance, the following example will let you type make to build an executable file with the given name: %.c:
gcc -o $* $*.c
For this particular type of example, there is actually an even simpler way of writing this target using implicit targets--read on!
Implicit TargetsThere are some actions that are nearly ubiqutious: for instance, you might have a collection of .c files that you may wish to execute the same command for. Ideally, the name of the file would be the target; using the implicit target ".c" you can specify a command to execute for any target that corresponds to the name of a .c file (minus the .c extension). .c:
$(CC) -o $@ $@.c
This rule says that for any target that corresponds to a .c file, make should compile it using the name of the implicit target as the output, and the name plus the .c extension as the file to compile. For example, % make test_executable
would run gcc -o test_executable test_executable.c
if test_executable did not have an explicit target associated with it.
Macro ModificationSince the point of using macros is to eliminate redundant text, it should come as no surprise that it is possible to transform macros from one type into another using various macro modifications.
Replacing TextIt is possible to create a new macro based on replacing part of an old macro. For instance, given a list of source files, called SRC, you might wish to generate the corresponding object files, stored in a macro called OBJ. To do so, you can specify that OBJ is equivalent to SRC, except with the .c extension replaced with a .o extension: OBJ = $(SRC:.c=.o)
Note that this is effectively saying that in the macro SRC, .c should be replaced with .o.

What is spyware?


Spyware is a general term used to describe software that performs certain behaviors such as advertising, collecting personal information, or changing the configuration of your computer, generally without appropriately obtaining your consent first.
Spyware is often associated with software that displays advertisements (called adware) or software that tracks personal or sensitive information.
That does not mean all software that provides ads or tracks your online activities is bad. For example, you might sign up for a free music service, but you "pay" for the service by agreeing to receive targeted ads. If you understand the terms and agree to them, you may have decided that it is a fair tradeoff. You might also agree to let the company track your online activities to determine which ads to show you.
Other kinds of spyware make changes to your computer that can be annoying and can cause your computer slow down or crash.
These programs can change your Web browser's home page or search page, or add additional components to your browser you don't need or want. These programs also make it very difficult for you to change your settings back to the way you originally had them.
The key in all cases is whether or not you (or someone who uses your computer) understand what the software will do and have agreed to install the software on your computer.
There are a number of ways spyware or other unwanted software can get on your computer. A common trick is to covertly install the software during the installation of other software you want such as a music or video file sharing program.
Whenever you install something on your computer, make sure you carefully read all disclosures, including the license agreement and privacy statement. Sometimes the inclusion of unwanted software in a given software installation is documented, but it might appear at the end of a license agreement or privacy statement.

Tuesday, April 24, 2007

Determining the Size of a Class Object

By Girish ShettyThere are many factors that decide the size of an object of a class in C++. These factors are:
-Size of all non-static data members
-Order of data members
-Byte alignment or byte padding
-Size of its immediate base class
-The existence of virtual function(s) (Dynamic polymorphism using virtual functions).
Compiler being used
-Mode of inheritance (virtual inheritance)
Size of all non-static data membersOnly non-static data members will be counted for calculating sizeof class/object. class A {
private:
float iMem1;
const int iMem2;
static int iMem3;
char iMem4;
};
For an object of class A, the size will be the size of float iMem1 + size of int iMem2 + size of char iMem3. Static members are really not part of the class object. They won't be included in object's layout. <2>Order of data members
The order in which one specifies data members also alters the size of the class. class C {
char c;
int int1;
int int2;
int i;
long l;
short s;
};
The size of this class is 24 bytes. Even though char c will consume only 1 byte, 4 bytes will be allocated for it, and the remaining 3 bytes will be wasted (holes). This is because the next member is an int, which takes 4 bytes. If we don't go to the next (4th) byte for storing this integer member, the memory access/modify cycle for this integer will be 2 read cycles. So the compiler will do this for us, unless we specify some byte padding/packing. If I re-write the above class in different order, keeping all my data members like below: class C {
int int1;
int int2;
int i;
long l;
short s;
char c;
};
Now the size of this class is 20 bytes. In this case, it is storing c, the char, in one of the slots in the hole in the extra four bytes.
Byte alignment or byte paddingAs mentioned above, if we specify 1 byte alignment, the size of the class above (class C) will be 19 in both cases.
Size of its immediate base classThe size of a class also includes size of its immediate base class. Lets take an example: Class B {
...
int iMem1;
int iMem2;
}
Class D: public B {
...
int iMem;
}
In this case, sizeof(D) is will also include the size of B. So it will be 12 bytes.
The existence of virtual function(s)Existence of virtual function(s) will add 4 bytes of virtual table pointer in the class, which will be added to size of class. Again, in this case, if the base class of the class already has virtual function(s) either directly or through its base class, then this additional virtual function won't add anything to the size of the class. Virtual table pointer will be common across the class hierarchy. That is class Base {
public:
...
virtual void SomeFunction(...);
private:
int iAMem
};
class Derived : public Base {
...
virtual void SomeOtherFunction(...);
private:
int iBMem
};
In the example above, sizeof(Base) will be 8 bytes--that is sizeof(int iAMem) + sizeof(vptr). sizeof(Derived) will be 12 bytes, that is sizeof(int iBMem) + sizeof(Derived). Notice that the existence of virtual functions in class Derived won't add anything more. Now Derived will set the vptr to its own virtual function table.
Compiler being usedIn some scenarios, the size of a class object can be compiler specific. Lets take one example: class BaseClass {
int a;
char c;
};
class DerivedClass : public BaseClass {
char d;
int i;
};
If compiled with the Microsoft C++ compiler, the size of DerivedClass is 16 bytes. If compiled with gcc (either c++ or g++), size of DerivedClass is 12 bytes. The reason for sizeof(DerivedClass) being 16 bytes in MC++ is that it starts each class with a 4 byte aligned address so that accessing the member of that class will be easy (again, the memory read/write cycle).
Mode of inheritance (virtual inheritance)In C++, sometimes we have to use virtual inheritance for some reasons. (One classic example is the implementation of final class in C++.) When we use virtual inheritance, there will be the overhead of 4 bytes for a virtual base class pointer in that class. class ABase{
int iMem;
};
class BBase : public virtual ABase {
int iMem;
};
class CBase : public virtual ABase {
int iMem;
};
class ABCDerived : public BBase, public CBase {
int iMem;
};
And if you check the size of these classes, it will be:
Size of ABase : 4
Size of BBase : 12
Size of CBase : 12
Size of ABCDerived : 24 Because BBase and CBase are dervied from ABase virtually, they will also have an virtual base pointer. So, 4 bytes will be added to the size of the class (BBase and CBase). That is sizeof ABase + size of int + sizeof Virtual Base pointer. Size of ABCDerived will be 24 (not 28 = sizeof (BBase + CBase + int member)) because it will maintain only one Virtual Base pointer (Same way of maintaining virtual table pointer).

Monday, April 23, 2007

Programming Style, Naming Conventions

Good naming can make a huge difference in program readability. Names like proc1, proc2, and proc3 mean next-to-nothing, but even apparently decent names can be ambiguous. For instance, to name a function that takes two ranges and computes the amount of the first that lies within the second, names like computeRangeWithinRange might sound reasonable. Unfortunately, this name gives no information about the order of arguments to the function. Sometimes this won't be a problem because an intelligent IDE should help you determine the names of the arguments to the function, but it can be confusing when the code is printed, or when you try to quickly read through the code. A better name might be computeRangeWithinFirstRange, which at least gives a sense of the order of the arguments to the function.
General Naming ConventionsIt's usually best to choose a consistent set of naming convnetions for use throughout your code. Naming conventions usually govern things such as how you capitalize your variables, classes, and functions, whether you include a prefix for pointers, static data, or global data, and how you indicate that something is a private field of a class. There are a lot of common naming conventions for classes, functions and objects. Usually these are broken into several broad categories: c-style naming, camelCase, and CamelCase. C-style naming separates words in a name using underscores: this_is_an_identifer. There are two forms of camelCase: one that begins with a lowercase letter and then capitalizes the first letter of every ensuing word, and one that capitalizes the first letter of every single word. One popular convention is that leading capital letter CamelCase is used for the names of structs and classes, while normal camelCase is used for the names of functions and variables (although sometimes variables are written in c-style to make the visual separation between functions and variables more clear). It can be useful to use prefixes for certain types of data to remind you what they are: for instance, if you have a pointer, prefixing it with "p_" tells you that it's a pointer. If you see an assignment between a variable starting with "p_" and one that doesn't begin with "p_", then you immediately know that something fishy is going on. It can also be useful to use a prefix for global or static variables because each of these has a different behavior than a normal local variable. In the case of global variables, it is especially useful to use a prefix in order to prevent naming collisions with local variables (which can lead to confusion). Finally, a common convention is to prefix the private fields and methods of a class with an underscore: e.g., _private_data. This can make it easier to find out where to look in the body of a class for the declaration of a method, and it also helps keep straight what you should and should not do with a variable. For instance, a common rule is to avoid returning non-const references to fields of a class from functions that are more public than the field. For instance, if _age is a private field, then the public getAge function probably shouldn't return a non-const reference since doing so effectively grants write access to the field!
Hungarian NotationHungarian notation has commonly been associated with prefixing variables with information about their type--for instance, whether a variable is an integer or a double. This is usually not a useful thing to do because your IDE will tell you the type of a variable, and it can lead to bizarre and complicated looking names. The original idea behind Hungarian notation, however, was more general and useful: to create more abstract "types" that describe how the variable is used rather than how the variable is represented. This can be useful for keeping pointers and integers from intermixing, but it can also be a powerful technique for helping to separate concepts that are often used together, but that should not be mixed.
AbbrevationsAbbreviations are dangerous--vowels are useful and can speed up code reading. Resorting to abbreviations can be useful when the name itself is extremely long because names that are too long can be as hard to read as names that are too short. When possible, be consistent about using particular abbreviations, and restrict yourself to using only a small number of them. Common abbreviations include "itr" for "iterator" or "ptr" for pointer. Even names like i, j, and k are perfectly fine for loop counter variables (primarily because they are so common). Bad abbreviations include things like cmptRngFrmRng, which at the savings of only a few letters eliminates a great deal of readability. If you don't like typing long names, look into the auto-complete facilities of your text editor. You should rarely need to type out a full identifier. (In fact, you rarely want to do this: typos can be incredibly hard to spot.)

Sunday, April 22, 2007

Writing for Readability

There are a lot of ways to solve the same problem in C or C++। This is both good and bad; it is good because you have flexibility. It's also bad because you have flexibility--the flexibility to choose different solutions to the same problem when it shows up in different places. This is confusing because it obscures the underlying similarity between the problems.
Using FunctionsUnlike prose, where repeating the same word or phrase may seem redundant, in programming, it's perfectly fine to use the same construction over and over again. Of course, you may want to turn a repeated chunk of code into a function: this is even more readable because it gives the block of code a descriptive name. (At least you ought to make it descriptive!) You can also increase readability by using standard functions and data structures (such as the STL). Doing so avoids the confusion of someone who might ask, "why did you create a new function when you had a perfectly good one already available?" The problem is that people may assume that there's a reason for the new function and that it somehow differs from the standard version. Moreover, by using standard functions you help your reader understand the names of the arguments to the function. There's much less need to look at the function prototype to see what the arguments mean, or their order, or whether some arguments have default values.
Use Appropriate Language FeaturesThere are some obvious things to avoid: don't use a loop as though it were an if statement. Choose the right data type for your data: if you never need decimal places in a number, use an integer. If you mean for a value to be unsigned, used an unsigned number. When you want to indicate that a value should never change, use const to make it so. Try to avoid uncommon constructions unless you have good reason to use them; put another way, don't use a feature just because the feature exists. One rule of thumb is to avoid do-while loops unless you absolutely need one. People aren't generally as used to seeing them and, in theory, won't process them as well. I've never run into this problem myself, but think carefully about whether you actually need a do-while loop. Similarly, although the ternary operator is a great way of expressing some ideas, it can also be confusing for programmers who don't use it very often. A good rule of thumb is to use it only when necessary (for instance, in the initialization list of a constructor) and stick with the more standard if-else construction for everything else. Sure, it'll make your program four lines longer, but it'll make it that much easier for most people to read. There are some less obvious ways of using standard features. When you are looping, choose carefully between while, do-while, and for. For loops are best when you can fill in each part (initialization, conditional, and increment) with a fairly short expression. While loops are good for watching a sentinel variable whose value can be set in multiple places or whose value depends on some external event such as a network event. While loops are also better when the update step isn't really a direct "update" to the control variable--for instance, when reading lines from a text file, it might more sense to use a while loop than a for loop because the control depends on the result of the method call, not the value of the variable of interest: while (fgets(buf, sizeof(buf), fp) != NULL)
{
/* do stuff with buf */
}
It wouldn't make sense to write this sort of thing as a for loop. (Try it!)
Unpack Complex ExpressionsThere's no reason to put everything on a single line. If you have a complex calculation with multiple steps and levels of parentheses, it can be extremely helpful to go from a one-line calculation to one that uses temporary variables. This gives you two advantages; first, it makes it easier to follow the expression. Second, you can give a distinct name to each intermediate step, which can help the reader follow what is happening. Often, you'll want to reuse those intermediate calcuations anyway. In addition to mathematical calculations, this principle also applies to nested function calls. The fewer events that take place on a single line of code, the easier it is to follow exactly what's happening. Another advantage to unpacking an expression is that you can put more comments in-line to explain what's going on and why.
Avoid Magic NumbersMagic numbers are numbers that appear directly in the code without an obvious reason. For instance, what does the number 80 in the following expression mean? for( int i = 0; i < 80; ++i )
{
printf( "-" );
}
It might be the width of the screen, but it might also be the width of a map whose wall is being drawn. You just don't know. The best solution is to use macros, in C, or constants in C++. This gives you the chance to descriptively name your numbers. Doing so also makes it easier to spot the use of a particular number and differentiate between numbers with the same value that mean different things. Moreover, if you decide you need to change a value, you have a single point where you can make the change, rather than having to sift through your code.

Saturday, April 21, 2007

Unicode: What You Can Do About It Today

by Jeff BezansonIf you write an email in Russian and send it to somebody in Russia, it is depressingly unlikely that he or she will be able to read it. If you write software, the burden of this sad state of affairs rests on your shoulders.
Given modern hardware resources, it is unacceptable that we can't yet routinely communicate text in different scripts or containing technical symbols. Fortunately, we are getting there.
After reading a lot on the subject and incorporating Unicode compatibility into some of my software, I decided to prepare this quick and highly pragmatic guide to digital text in the 21st century (for C programmers, of course). I don't mind adding my voice to the numerous articles that already exist on this subject, since the world needs as many programmers as possible to pick up these skills as soon as possible.
I. Encoding textGiven the variety of human languages on this planet, text is a complex subject. Many are scared away from dealing with world scripts, because they think of the numerous related software problems in the area instead of focusing on what they can actually do with their code to help.
The first thing to know is that you do not have to worry about most problems with digital text. The most difficult work is handled below the application layer, in OSes, UI libraries, and the C library. To give you an idea of what goes on though, here is a summary of software problems surrounding text:
EncodingMapping characters to numbers. Many such mappings exist; once you know the encoding of a piece of text, you know what character is meant by a particular number. Unicode is one such mapping, and a popular one since it incorporates more characters than any other at this time.
DisplayOnce you know what character is meant, you have to find a font that has the character and render it. This task is much complicated by the need to display both left-to-right and right-to-left text, the existence of combining characters that modify previous characters and have zero width, the fact that some languages require wider character cells than others, and context-sensitive letterforms.
InputAn input method is a way to map keystrokes (most likely several keystrokes on a typical keyboard) to characters. Input is also complicated by bidirectional text.
Internationalization (i18n)This refers to the practice of translating a program into multiple languages, effectively by translating all of the program's strings.
LexicographyCode that processes text as more than just binary data might have to become a lot smarter. The problems of searching, sorting, and modifying letter case (upper/lower) vary per-language. If your application doesn't need to perform such tasks, consider yourself lucky. If you do need these operations, you can probably find a UI toolkit or i18n library that already implements them. If you are savvy with just the first issue (encoding), then OS-vendor-supplied input methods and display routines should magically work with your program. Whether you want to or are able to translate your software is another matter, and compared to proper handling of character encodings it is almost optional (corrupting data is worse than having an unintelligible UI).
The encoding I'll talk about is called Unicode. Unicode officially encodes 1,114,112 characters, from 0x000000 to 0x10FFFF. (The idea that Unicode is a 16-bit encoding is completely wrong.) For maximum compatibility, individual Unicode values are usually passed around as 32-bit integers (4 bytes per character), even though this is more than necessary. For convenience, the first 128 Unicode characters are the same as those in the familiar ASCII encoding.
The consensus is that storing four bytes per character is wasteful, so a variety of representations have sprung up for Unicode characters. The most interesting one for C programmers is called UTF-8. UTF-8 is a "multi-byte" encoding scheme, meaning that it requires a variable number of bytes to represent a single Unicode value. Given a so-called "UTF-8 sequence", you can convert it to a Unicode value that refers to a character.
UTF-8 has the property that all existing 7-bit ASCII strings are still valid. UTF-8 only affects the meaning of bytes greater than 127, which it uses to represent higher Unicode characters. A character might require 1, 2, 3, or 4 bytes of storage depending on its value; more bytes are needed as values get larger. To store the full range of possible 32-bit characters, UTF-8 would require a whopping 6 bytes. But again, Unicode only defines characters up to 0x10FFFF, so this should never happen in practice.
UTF-8 is a specific scheme for mapping a sequence of 1-4 bytes to a number from 0x000000 to 0x10FFFF: 00000000 -- 0000007F: 0xxxxxxx
00000080 -- 000007FF: 110xxxxx 10xxxxxx
00000800 -- 0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
00010000 -- 001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
The x's are bits to be extracted from the sequence and glued together to form the final number.
It is fair to say that UTF-8 is taking over the world. It is already used for filenames in Linux and is supported by all mainstream web browsers. This is not surprising considering its many nice properties:
It can represent all 1,114,112 Unicode characters.
Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII.
Characters usually require fewer than four bytes.
String sort order is preserved. In other words, sorting UTF-8 strings per-byte yields the same order as sorting them per-character by logical Unicode value.
A missing or corrupt byte in transmission can only affect a single character—you can always find the start of the sequence for the next character just by scanning a couple bytes.
There are no byte-order/endianness issues, since UTF-8 data is a byte stream. The only price to pay for all this is that there is no longer a one-to-one correspondence between bytes and characters in a string. Finding the nth character of a string requires iterating over the string from the beginning.
See What is UTF-8? for more information about UTF-8.
Side note: Some consider UTF-8 to be discriminatory, since it allows English text to be stored efficiently at one byte per character while other world scripts require two bytes or more. This is a troublesome point, but it should not get in the way of Unicode adoption. First of all, UTF-8 was not really designed to preferentially encode English text. It was designed to preserve compatibility with the large body of existing code that scans for special characters such as line breaks, spaces, NUL terminators, and so on. Furthermore, the encoding used internally by a program has little impact on the user as long as it is able to represent their data without loss. UTF-8 is a great boon, especially for C programming. Think of it this way: if it allows you to internationalize an application that would have been difficult to convert otherwise, it is much less discriminatory than the alternative.
II. The C libraryAll recent implementations of the standard C library have lots of functions for manipulating international strings. Before reading up on them, it helps to know some vocabulary:
"Multibyte character" or "multibyte string" refers to text in one of the many (possibly language-specific) encodings that exist throughout the world. A multibyte character does not necessarily require more than one byte to store; the term is merely intended to be broad enough to encompass encodings where this is the case. UTF-8 is in fact only one such encoding; the actual encoding of user input is determined by the user's current locale setting (selected as an option in a system dialog or stored as an environment variable in UNIX). Strings you get from the user will be in this encoding, and strings you pass to printf() are supposed to be as well. Strings within your program can of course be in any encoding you want, but you might have to convert them for proper display.
"Wide character" or "wide character string" refers to text where each character is the same size (usually a 32-bit integer) and simply represents a Unicode character value ("code point"). This format is a known common currency that allows you to get at character values if you want to. The wprintf() family is able to work with wide character format strings, and the "%ls" format specifier for normal printf() will print wide character strings (converting them to the correct locale-specific multibyte encoding on the way out).
The C library also provides functions like towupper() that can convert a wide character from any language to uppercase (if applicable). strftime() can format a date and time string appropriately for the current locale, and strcoll() can do international sorting. These and other functions that depend on locale must be initialized at the beginning of your program using #include
main()
{
char *locale;
locale = setlocale(LC_ALL, "");
...
}
You don't have to do anything with the locale string returned by setlocale(), but you can use it to query your user's locale settings (more on this later).
The C library pretty much assumes you will be using multibyte strings throughout your program (since that's what you get as input). Since multibyte strings are opaque, a lot of functions beginning with "mb" are provided to deal with them. Personally, I don't like not knowing what encoding my strings use. One concrete problem with the multibyte thing is file I/O— a given file could be in any encoding, independent of locale. When you write a file or send data over a network, keeping the multibyte encoding might be a bad idea. (Even if all software uses only the proper locale-independent C library functions, and all platforms support all encodings internally, there is still no single standard for communicating the encoding of a piece of text; email messages and HTML tags do it in various ways.) You also might be able to do more efficient processing, or avoid rewriting code, if you knew the encoding your strings used.
Your encoding options
You are free to choose a string encoding for internal use in your program. The choice pretty much boils down to either UTF-8, wide (4-byte) characters, or multibyte. Each has its advantages and disadvantages:
UTF-8
Pro: compatible with all existing strings and most existing code
Pro: takes less space
Pro: widely used as an interchange format (e.g. in XML)
Con: more complex processing, O(n) string indexing Wide characters
Pro: easy to process
Con: wastes space
Pro/Con: although you can use the syntax L"Hello, world."to easily include wide-character strings in C programs, the size of wide characters is not consistent across platforms (some incorrectly use 2-byte wide characters)
Con: should not be used for output, since spurious zero bytes and other low-ASCII characters with common meanings (such as '/' and '\n') will likely be sprinkled throughout the data. Multibyte
Pro: no conversions ever needed on input and output
Pro: built-in C library support
Pro: provides the widest possible internationalization, since in rare cases conversion between local encodings and Unicode does not work well
Con: strings are opaque
Con: perpetuates incompatibilities. For example, there are three major encodings for Russian. If one Russian sends data to another through your program, the recipient will not be able to read the message if his or her computer is configured for a different Russian encoding. But if your program always converts to UTF-8, the text is effectively normalized so that it will be widely legible (especially in the future) no matter what encoding it started in.
In this article I will advocate and give explicit instruction on using UTF-8 as an internal string encoding. Many Linux users already set their environment to a UTF-8 locale, in which case you won't even have to do any conversions. Otherwise you will have to convert multibyte to wide to UTF-8 on input, and back to multibyte on output. Nevertheless, UTF-8 has its advantages.
III. What to do right nowBelow I'll outline concrete steps any C programmer could take to bring his or her code up to date with respect to text encoding. I'll also be presenting a simple C library that provides the routines you need to manipulate UTF-8.
Here's your to-do list:
"char" no longer means characterI hereby recommend referring to character codes in C programs using a 32-bit unsigned integer type. Many platforms provide a "wchar_t" (wide character) type, but unfortunately it is to be avoided since some compilers allot it only 16 bits—not enough to represent Unicode. Wherever you need to pass around an individual character, change "char" to "unsigned int" or similar. The only remaining use for the "char" type is to mean "byte".
Get UTF-8-cleanTo take advantage of UTF-8, you'll have to treat bytes higher than 127 as perfectly ordinary characters. For example, say you have a routine that recognizes valid identifier names for a programming language. Your existing standard might be that identifiers begin with a letter: int valid_identifier_start(char ch)
{
return ((ch >= 'A' && ch <= 'Z') (ch >= 'a' && ch <= 'z'));
}
If you use UTF-8, you can extend this to allow letters from other languages as follows: int valid_identifier_start(char ch)
{
return ((ch >= 'A' && ch <= 'Z') (ch >= 'a' && ch <= 'z')
((unsigned char)ch >= 0xC0));
}
A UTF-8 sequence can only start with values 0xC0 or greater, so that's what I used for checking the start of an identifier. Within an identifier, you would also want to allow characters >= 0x80, which is the range of UTF-8 continuation bytes.
Most C string library routines still work with UTF-8, since they only scan for for terminating NUL characters. A notable exception is strchr(), which in this context is more aptly named "strbyte()". Since you will be passing character codes around as 32-bit integers, you need to replace this with a routine such as my u8_strchr() that can scan UTF-8 for a given character. The traditional strchr() returns a pointer to the location of the found character, and u8_strchr() follows suit. However, you might want to know the index of the found character, and since u8_strchr() has to scan through the string anyway, it keeps a count and returns a character index as well.
With the old strchr(), you could use pointer arithmetic to determine the character index. Now, any use of pointer arithmetic on strings is likely to be broken since characters are no longer bytes. You'll have to find and fix any code that assumes "(char*)b - (char*)a" is the number of characters between a and b (though it is still of course the number of bytes between a and b).
Interface with your environmentUsing UTF-8 as an internal encoding is now widespread among C programmers. However, the environment your program runs in will not necessarily be nice enough to feed you UTF-8, or expect UTF-8 output.
The functions mbstowcs() and wcstombs() convert from and to locale-specific encodings, respectively. "mbs" means multibyte string (i.e. the locale-specific string), and "wcs" means wide character string (universal 4-byte characters). Clearly, if you use wide characters internally, you are in luck here. If you use UTF-8, there is a chance that the user's locale will be set to UTF-8 and you won't have to do any conversion at all. To take advantage of that situation, you will have to specifically detect it (I'll provide a function for it). Otherwise, you will have to convert from multibyte to wide to UTF-8.
Version 1.6 (1.5.x while in development) of the FOX toolkit uses UTF-8 internally, giving your program a nice all-UTF-8-all-the-time environment. GTK2 and Qt also support UTF-8.
Modify APIs to discourage O(n^2) string processingThe idea of non-constant-time string indexing may worry you. But when you think about it, you rarely need to specifically access the nth character of a string. Algorithms almost never need to make requests like "Quick! Get me the 6th character of this piece of text!" Typically, if you're accessing characters you're iterating over the whole string or most of it. UTF-8 is simple enough to process that iterating over characters takes essentially the same time as iterating over bytes.
In your own code, you can use my u8_inc() and u8_dec() to move through strings. If you develop libraries or languages, be sure to expose some kind of inc() and dec() API so nobody has to move through a string by repeatedly requesting the nth character.
IV. Some UTF-8 routinesVarious libraries are available for internationalization and converting between different text encodings. However, I couldn't find a straightforward set of C routines providing the minimal support needed for using UTF-8 as an internal encoding (although this functionality is often embedded in large UI toolkits and such). I decided to create a small library that could be used to bring UTF-8 to arbitrary C programs.
This library is quite incomplete; you might want to look at related FSF offerings and libutf8. libutf8 provides the multibyte and wide character C library routines mentioned above, in case your C library doesn't have them.
Since performance is sometimes a concern with UTF-8, I made my routines as fast and lightweight as possible. They perform minimal error checking— in particular, they do not bother to determine whether a sequence is valid UTF-8, which can actually be a security problem. I justify this decision by reiterating that the intention of the library is to manipulate an internal encoding; you can enforce that all strings you store in memory be valid UTF-8, enabling the library to make that assumption. Routines for validating and converting from/to UTF-8 are available free from Unicode, Inc.
Note that my routines do not need to support the many encodings of the world—the C library can handle that. If the current locale is not UTF-8, you call mbstowcs() on user input to convert any encoding (whatever it is) to a wide character string, then use my u8_toutf8() to convert it to the UTF-8 your program is comfortable with. Here's an example input routine wrapping readline(): char *get_utf8_input()
{
char *line, *u8s;
unsigned int *wcs;
int len;
line = readline("");
if (locale_is_utf8) {
return line;
}
else {
len = mbstowcs(NULL, line, 0)+1;
wcs = malloc(len * sizeof(int));
mbstowcs(wcs, line, len);
u8s = malloc(len * sizeof(int));
u8_toutf8(u8s, len*sizeof(int), wcs, len);
free(line);
free(wcs);
return u8s;
}
The first call to mbstowcs() uses the special parameter value NULL to find the number of characters in the opaque multibyte string.
Anyway, on with the routines. They are divided into four groups:
Group 1: conversions /* is c the start of a utf8 sequence? */
#define isutf(c) (((c)&0xC0)!=0x80)
/* convert UTF-8 to UCS-4 (4-byte wide characters)
srcsz = source size in bytes, or -1 if 0-terminated
sz = dest size in # of wide characters
returns # characters converted */
int u8_toucs(unsigned int *dest, int sz, char *src, int srcsz);
/* convert UCS-4 to UTF-8
srcsz = number of source characters, or -1 if 0-terminated
sz = size of dest buffer in bytes
returns # characters converted */
int u8_toutf8(char *dest, int sz, unsigned int *src, int srcsz);
/* single character to UTF-8 */
int u8_wc_toutf8(char *dest, wchar_t ch);
Note that the library uses "unsigned int" as its wide character type.You can convert a known number of bytes, or a NUL-terminated string. The length of a UTF-8 string is often communicated as a byte count, since that's what really matters. Recall that you can usually treat a UTF-8 string like a normal C-string with N characters (where N is the number of bytes in the UTF-8 sequence), with the possibility that some characters are >127.
Group 2: moving through UTF-8 strings /* character number to byte offset */
int u8_offset(char *str, int charnum);
/* byte offset to character number */
int u8_charnum(char *s, int offset);
/* return next character, updating a byte-index variable */
unsigned int u8_nextchar(char *s, int *i);
/* move to next character */
void u8_inc(char *s, int *i);
/* move to previous character */
void u8_dec(char *s, int *i);
Group 3: unicode escape sequencesIn the absence of unicode input methods, unicode characters are often notated using special escape sequences beginning with \u or \U. \u expects up to four hexadecimal digits, and \U expects up to eight. With these routines your program can accept input and give output using such sequences if necessary. /* assuming src points to the character after a backslash, read an
escape sequence, storing the result in dest and returning the number of
input characters processed */
int u8_read_escape_sequence(char *src, unsigned int *dest);
/* given a wide character, convert it to an ASCII escape sequence stored in
buf, where buf is "sz" bytes. returns the number of characters output. */
int u8_escape_wchar(char *buf, int sz, unsigned int ch);
/* convert a string "src" containing escape sequences to UTF-8 */
int u8_unescape(char *buf, int sz, char *src);
/* convert UTF-8 "src" to ASCII with escape sequences.
if escape_quotes is nonzero, quote characters will be preceded by
backslashes as well. */
int u8_escape(char *buf, int sz, char *src, int escape_quotes);
/* utility predicates used by the above */
int octal_digit(char c);
int hex_digit(char c);
Group 4: replacements for standard functions /* return a pointer to the first occurrence of ch in s, or NULL if not
found. character index of found character returned in *charn. */
char *u8_strchr(char *s, unsigned int ch, int *charn);
/* same as the above, but searches a buffer of a given size instead of
a NUL-terminated string. */
char *u8_memchr(char *s, unsigned int ch, size_t sz, int *charn);
/* count the number of characters in a UTF-8 string */
int u8_strlen(char *s);
/* given the string returned by setlocale(), determine whether the current
locale speaks UTF-8 */
int u8_is_locale_utf8(char *locale);
/* these functions can print from UTF-8 strings. they make no assumptions
about locale; you can circumvent them if is_locale_utf8 */
int u8_vprintf(char *fmt, va_list ap);
int u8_printf(char *fmt, ...);

Friday, April 20, 2007

So you want to write a game?

By Alex Hoffer What exactly do you need to know to write a game? Well you, have to know what makes a game fun, of course, but from a technical perspective, what is there to it?
The best way to learn how to write games is, alas, to get a solid grounding in the basics of computer science.
GraphicsThere is perhaps no single area of gaming in which players standards have risen more than in the field of graphics. While players in the 70s and 80s were accustomed to screens of text or 16-bit black-and-white graphics, the users of today demand Riven-quality, 3D graphics, complete with lighting effects, textures, and scale. Unfortunately, graphics are hard to do. They require math, at least on the level of multivariable calculus, and a great deal of time. But if you aren't quite a math wizard yet, don't despair; for more casual games, most people are willing to accept simple 2D sprites. Anyone who's ever been addicted to Tetris knows that while simple graphics may not cut it on the shelves any more, they're no barrier to a satisfying gaming experience. For more info on graphics programming, check out our tutorial on creating graphics with OpenGL and the 3D rotations series at the advanced tutorial section of the site.
SpeedIf youre writing a game--particularly an arcade-style game--speed is all-important. Hardware advances have to some extent made optimization less relevant, but for every advance in hardware, an advance in graphics is sure to come along to take up the slack. And even a more simple game can run less than lightning-fast if it's written in an interpreted language like Java. Learning how to speed things up requires a familiarity with data structures and algorithms. Choosing the right data structure to store your game data in can mean the difference between a pokey game and one that zips along. For information on efficient game programming, see AI Horizon's article on efficient chessboard representation or algorithmic efficiency and big-o notation.
StructureNo matter what kind of program you're writing, from an implementation of Space Invaders to a filing system for grandma's recipes, clear programming structure is vital when it comes to understanding your code when you come back to work on it after a break, or, even moreso, if you plan on working with a buddy who has to be able to understand what you wrote or use your functions. Good commenting and readable style are a must. Games are also well-suited to object-oriented programming, since the elements of your game world are often, well, objects, like a spaceship, a bullet, or an evil alien vessel. Choosing classes appropriately is key, and inheritance can often come in handy (for example, if your game has several different light sources, or monsters, or doors, that all share some basic properties.) For more tips on style, see how and why to comment, thinking about programming, and this series on programming style.

Tuesday, April 17, 2007

10 Tips To Effective Search Engine Optimization, Submission And Promotion

-Make sure your website is tested for good load time, dead links and cross-browser compatibility. Netmechanic.com is a good website to do this.
-Optimize you web site pages by making sure your top keywords appear in your title, meta tags and content.
-Make sure you provide quality content that have something unique to offer and that have keywords or key phrases people might search to find your site.
-If you sell products, give something away free (The word "free" is one of the top most searched words on the internet).
-Build quality links to your site pages from other well ranked sites on your target search engine.
Make a list of your top keywords and key-phrases and track your listings/ranking in the top 10 search engines and analyze it periodically.
-Submit to top directories there are many directories out there that list websites in their related categories - Just find the top ranked ones and submit your pages under related sections.
-Make sure your web site looks attractive and easy to use with clear navigation and easy to read layout and fonts. This is especially useful for directories as manual approval is required for them.
-Keep up to date with the latest in SEO by reading through articles, forums and related guides - remember SEO requires a lot of time and patience. You have to keep working on it.
-Keep your website focussed to cater to your visitors needs - don't throw in a free-for-all links program or forum to just attract traffic - in the long run this will pay-off as you website will be recognized for what it is.

Convert Excel files to simple Wikimedia and HTML tables

I recently installed a Wiki at work, but our users aren't exactly computer experts, and Wiki experts even less so. So I wanted to make a tool to allow them to upload an Excel file and have it spit out Wiki code for a table. I started in ASP, but just uploading a file and reading other form values at the same time was enough to make me tear my hair out. (Yes, I looked at the ASP upload posts here, but most of it was far too complicated for what I was doing)Since Wikimedia is in PHP anyway, I figured I'd see if someone had created an Excel reader in PHP, and lo & behold, I found the aptly-named PHP-ExcelReader on Sourceforge.net. It works quite well, most of the bugs have been fixed, and it's pretty easy to use.The first page has a simple form and instructions. It allows the user to select their file, the rows that contain a title, header rows, and the first data row. I also added some optional style selections, such as positioning (none/center/float left/float right), font/bg colors, cell padding, cell spacing, and border width. After clicking "Next," the Wikimedia table code is displayed, along with a rendition of the table itself (in HTML, of course). Note: if you decide to modify this to create HTML code and strip out the Wiki part, the positioning/floating is not included (for my purposes, the HTML table is only to allow the user to see roughly what it will look like by itself).So far, it handles vertically or horizontally merged cells; however if a cell is merged over multiple rows and columns, data starts disappearing. That is a problem I couldn't figure out within the time I alotted myself (I have plenty of REAL work to do ). I had some trouble displaying dates correctly, but I'm not sure if the problem was with PHP-ExcelReader, my server, or something in my code (don't think so). Also, if a column is used and later all cells emptied, it seems to think it's still a used cell, despite the lack of any data.It does NOT, however, use any style or formatting from the Excel file: just data. No bold/italics, no font size changes, no charts or images (of course), and it does not know how wide the columns are.So it is very simple and not feature-rich, but I thought others might find it somewhat useful, even if just as an example of how to use PHP-ExcelReader. I'd be interested to know if anyone finds it useful or notices any obvious bugs.The attached ZIP file contains three PHP files: index.php contains my code, while reader.php and oleread.inc are the files from PHP-ExcelReader (unmodified). In index.php there are two fields that may need to be edited for your use. $thisfile can be changed if you don't want your file called indes.php, and $dropdir needs to be writeable by your web server.The file I uploaded disappeared. Here is a link to the file from another post
__________________www.Rhymezilla.com

Monday, April 16, 2007

C++ and Java Syntax Differences Cheat Sheet

main function
C++// free-floating function
int main( int argc, char* argv[])
{
printf( "Hello, world" );
}
Java// every function must be part of a class; the main function for a particular
// class file is invoked when java is run (so you can have one
// main function per class--useful for writing unit tests for a class)
class HelloWorld
{
public static void main(String args[])
{
System.out.println( "Hello, World" );
}
}
Compiling
C++ // compile as
g++ foo.cc -o outfile
// run with
./outfile

Java // compile classes in foo.java to .class
javac foo.java
// run by invoking static main method in
java

CommentsSame in both languages (// and /* */ both work)
Class declarationsAlmost the same, but Java does not require a semicolon
C++ class Bar {};

Java class Bar {}

Method declarationsSame, except that in Java, must always be part of a class, and may prefix with public/private/protected
Constructors and destructorsConstructor has same syntax in both (name of the class), Java has no exact equivalent of the destructor
Static member functions and variablesSame as method declarations, but Java provides static initialization blocks to initialize static variables (instead of putting a definition in a source code file): class Foo
{
static private int x;
// static initialization block
{ x = 5; }
}
Object declarations
C++ // on the stack
myClass x;
// or on the heap
myClass *x = new myClass;

Java // always allocated on the heap (also, always need parens for constructor)
myClass x = new myClass();

References vs. pointers
C++ // references are immutable, use pointers for more flexibility
int bar = 7, qux = 6;
int& foo = bar;

Java // references are mutuable and store addresses only to objects; there are
// no raw pointers
myClass x;
x.foo(); // error, x is a null ``pointer''
// note that you always use . to access a field

Inheritance
C++ class Foo : public Bar
{ ... };

Java class Foo extends Bar
{ ... }

Protection levels (abstraction barriers)
C++ public:
void foo();
void bar();

Java public void foo();
public void bar();

Virtual functions
C++ virtual int foo(); // or, non-virtually as simply int foo();

Java // functions are virtual by default; use final to prevent overriding
int foo(); // or, final int foo();

Abstract classes
C++ // just need to include a pure virtual function
class Bar { public: virtual void foo() = 0; };

Java // syntax allows you to be explicit!
abstract class Bar { public abstract void foo(); }
// or you might even want to specify an interface
interface Bar { public void foo(); }
// and later, have a class implement the interface:
class Chocolate implements Bar
{
public void foo() { /* do something */ }
}

Memory managementRoughly the same--new allocates, but no delete in Java since it has garbage collection.
NULL vs. null
C++ // initialize pointer to NULL
int *x = NULL;

Java // the compiler will catch the use of uninitialized references, but if you
// need to initialize a reference so it's known to be invalid, assign null
myClass x = null;

BooleansJava is a bit more verbose: you must write boolean instead of merely bool.
C++bool foo;
Javaboolean foo;
Const-ness
C++ const int x = 7;

Java final int x = 7;

Throw SpecFirst, Java enforce throw specs at compile time--you must document if your method can throw an exception
C++int foo() throw (IOException)
Javaint foo() throws IOException
Arrays
C++ int x[10];
// or
int *x = new x[10];
// use x, then reclaim memory
delete[] x;

Java int[] x = new int[10];
// use x, memory reclaimed by the garbage collector or returned to the
// system at the end of the program's lifetime

Collections and Iteration
C++Iterators are members of classes. The start of a range is .begin(), and the end is .end(). Advance using ++ operator, and access using *. vector myVec;
for ( vector::iterator itr = myVec.begin();
itr != myVec.end();
++itr )
{
cout << *itr;
}

JavaIterator is just an interface. The start of the range is .iterator, and you check to see if you're at the end with itr.hasNext(). You get the next element using itr.next() (a combination of using ++ and * in C++). ArrayList myArrayList = new ArrayList();
Iterator itr = myArrayList.iterator();
while ( itr.hasNext() )
{
System.out.println( itr.next() );
}
// or, in Java 5
ArrayList myArrayList = new ArrayList();
for( Object o : myArrayList ) {
System.out.println( o );
}

TemplatesThis is still being to be added. See http://java.sun.com/j2se/1.5/pdf/generics-tutorial.pdf for a good introduction.

Understanding the Start of an Object's Lifetime

In C++, whenever an object of a class is created, its constructor is called. But that's not all--its parent class constructor is called, as are the constructors for all objects that belong to the class. By default, the constructors invoked are the default ("no-argument") constructors. Moreover, all of these constructors are called before the class's own constructor is called.
For instance, take the following code: include
class Foo
{
public:
Foo() { std::cout << "Foo's constructor" << std::endl; }
};
class Bar : public Foo
{
public:
Bar() { std::cout << "Bar's constructor" << std::endl; }
};
int main()
{
// a lovely elephant ;)
Bar bar;
}
The object bar is constructed in two stages: first, the Foo constructor is invoked and then the Bar constructor is invoked. The output of the above program will be to indicate that Foo's constructor is called first, followed by Bar's constructor. Why do this? There are a few reasons. First, each class should need to initialize things that belong to it, not things that belong to other classes. So a child class should hand off the work of constructing the portion of it that belongs to the parent class. Second, the child class may depend on these fields when initializing its own fields; therefore, the constructor needs to be called before the child class's constructor runs. In addition, all of the objects that belong to the class should be initialized so that the constructor can use them if it needs to. But what if you have a parent class that needs to take arguments to its constructor? This is where initialization lists come into play. An initialization list immediately follows the constructor's signature, separated by a colon: class Foo : public parent_class
{
Foo() : parent_class( "arg" ) // sample initialization list
{
// you must include a body, even if it's merely empty
}
};
Note that to call a particular parent class constructor, you just need to use the name of the class (it's as though you're making a function call to the constructor). For instance, in our above example, if Foo's constructor took an integer as an argument, we could do this: include
class Foo
{
public:
Foo( int x )
{
std::cout << "Foo's constructor "
<< called with "
<< x
<< std::endl;
}
};
class Bar
{
public:
Bar() : Foo( 10 ) // construct the Foo part of Bar
{
std::cout << "Bar's constructor" << std::endl;
}
};
int main()
{
Bar stool;
}
Using Initialization Lists to Initialize FieldsIn addition to letting you pick which constructor of the parent class gets called, the initialization list also lets you specify which constructor gets called for the objects that are fields of the class. For instance, if you have a string inside your class: class Qux
{
public:
Qux() : _foo( "initialize foo to this!" ) { }
// This is nearly equivalent to
// Qux() { _foo = "initialize foo to this!"; }
// but without the extra call to construct an empty string
private:
std::string _foo;
};
Here, the constructor is invoked by giving the name of the object to be constructed rather than the name of the class (as in the case of using initialization lists to call the parent class's constructor). If you have multiple fields of a class, then the names of the objects being initialized should appear in the order they are declared in the class (and after any parent class constructor call): class Baz
{
public:
Baz() : _foo( "initialize foo first" ), _bar( "then bar" ) { }
private:
std::string _foo;
std::string _bar;
};
Initialization Lists and Scope IssuesIf you have a field of your class that is the same name as the argument to your constructor, then the initialization list "does the right thing." For instance, class Baz
{
public:
Bar( std::string foo ) : foo( foo ) { }
private:
std::string foo;
};
is roughly equivalent to class Baz
{
public:
Bar( std::string foo )
{
this->foo = foo;
}
private:
std::string foo;
};
That is, the compiler knows which foo belongs to the object, and which foo belongs to the function.
Initialization Lists and Primitive TypesIt turns out that initialization lists work to initialize both user-defined types (objects of classes) and primitive types (e.g., int). When the field is a primitive type, giving it an argument is equivalent to assignment. For instance, class Quux
{
public:
Quux() : _my_int( 5 ) // sets _my_int to 5
{ }
private:
int _my_int;
};
This behavior allows you to specify templates where the templated type can be either a class or a primitive type (otherwise, you would have to have different ways of handling initializing fields of the templated type for the case of classes and objects). template
class my_template
{
public:
// works as long as T has a copy constructor
my_template( T bar ) : _bar( bar ) { }
private:
T _bar;
};
Initialization Lists and Const FieldsUsing initialization lists to initialize fields is not always necessary (although it is probably more convenient than other approaches). But it is necessary for const fields. If you have a const field, then it can be initialized only once, so it must be initialized in the initialization list. class const_field
{
public:
const_field() : _constant( 1 ) { }
// this is an error: const_field() { _constant = 1; }
private:
const int _constant;
};
When Else do you Need Initialization Lists?
No Default ConstructorIf you have a field that has no default constructor (or a parent class with no default constructor), you must specify which constructor you wish to use.
ReferencesIf you have a field that is a reference, you also must initialize it in the initialization list; since references are immutable they can be initialized only once.
Initialization Lists and ExceptionsSince constructors can throw exceptions, it's possible that you might want to be able to handle exceptions that are thrown by constructors invoked as part of the initialization list. First, you should know that even if you catch the exception, it will get rethrown because it cannot be guaranteed that your object is in a valid state because one of its fields (or parts of its parent class) couldn't be initialized. That said, one reason you'd want to catch an exception here is that there's some kind of translation of error messages that needs to be done. The syntax for catching an exception in an initialization list is somewhat awkward: the 'try' goes right before the colon, and the catch goes after the body of the function: class Foo
{
Foo() try : _str( "text of string" )
{
}
catch ( ... )
{
std::cerr << "Couldn't create _str";
// now, the exception is rethrown as if we'd written
// "throw;" here
}
};
Initialization Lists: SummaryBefore the body of the constructor is run, all of the constructors for its parent class and then for its fields are invoked. By default, the no-argument constructors are invoked. Initialization lists allow you to choose which constructor is called and what arguments that constructor receives. If you have a reference or a const field, or if one of the classes used does not have a default constructor, you must use an initialization list.

Wednesday, April 11, 2007

Gotchas for a C programmer using C++

Implicit Assignment from void*You cannot implicitly assign from a void* to any other type. For instance, the following is perfectly valid in C (in fact, it's arguably the preferable way of doing it in C) int *x = malloc(sizeof(int) * 10);
but it won't compile in C++. (Try it yourself!) The explanation from Bjarne Stroustrup himself is that this isn't type safe. What this means is that you can have a void* that points to anything at all, and if you then assign the address stored in that void* to another pointer of a different type, there isn't any warning at all about it. Consider the following: int an_int;
void *void_pointer = &an_int;
double *double_ptr = void_pointer;
*double_ptr = 5;
When you assign *double_ptr the value 5, it's writing 8 bytes of memory, but the integer variable an_int is only 4 bytes. Forcing a cast from a void pointer makes the programmer pay attention to these things.
Freeing arrays: new[] and delete[]In C, there's only one major memory allocation function: malloc. You use it to allocate both single elements and arrays: int *x = malloc( sizeof(int) );
int *x_array = malloc( sizeof(int) * 10 );
and you always release the memory in the same way: free( x );
free( x_array );
In C++, however, memory allocation for arrays is somewhat different than for single objects; you use the new[] operator, and you must match calls to new[] with calls to delete[] (rather than to delete). int *x = new int;
int *x_array = new int[10];
delete x;
delete[] x;
The short explanation is that when you have arrays of objects, delete[] with properly call the destructor for each element of the array, whereas delete will not.
You must declare functions before useAlthough most good C code will follow this convention, in C++ it is strictly enforced that all functions must be declared before they are used. This code is valid C, but it is not valid C++: #include
int main()
{
foo();
return 0;
}
int foo()
{
printf( "Hello world" );
}
Gotcha for a C++ programmer using C
Structs and EnumsYou have to include the struct keyword before the name of the struct type to declare a struct: In C++, you could do this struct a_struct
{
int x;
};
a_struct struct_instance;
and have a new instance of a_struct called struct_instance. In C, however, we have to include the struct keyword when declaring struct_instance: struct a_struct struct_instance;
In fact, a similar situation also holds for declaring enums: in C, you must include the keyword enum; in C++, you don't have to. As a side note, most C programmers get around this issue by using typedefs: typedef struct struct_name
{
/* variables */
} struct_name_t;
Now you can declare a struct with struct_name_t struct_name_t_instance;
But there is another gotcha for C++ programmers: you must still use the "struct struct_name" syntax to declare a struct member that is a a pointer to the struct. typedef struct struct_name
{
struct struct_name instance;
struct_name_t instance2; /* invalid! The typedef isn't defined yet */
} struct_name_t;
C++ has a much larger libraryC++ has a much larger library than C, and some things may be automatically linked in by C++ when they are not with C. For instance, if you're used to using g++ for math-heavy computations, then it may come as a shock that when you are using gcc to compile C, you need to explicitly include the math library for things like sin or even sqrt: % g++ foo.cc
or
% gcc foo.c -lm
No Boolean TypeC does not provide a native boolean type. You can simulate it using an enum, though: typedef enum {FALSE, TRUE} bool;
main Doesn't Provide return 0 AutomaticallyIn C++, you are free to leave off the statement 'return 0;' at the end of main; it will be provided automatically: int main()
{
print( "Hello, World" );
}
but in C, you must manually add it: int main()
{
print( "Hello, World" );
return 0;
}