When programs crash they create a core file (sometimes you need to do some configuration to ensure this happens and what the file is called).
gdb myprog core
(This loads the information dumped when the program crashed)
bt (or backtrace)
provides the call stack when the error occured. It shows you exactly on which line of which file the problem occured (if you have debug symbols).
info locals
prints out information about local variables, usually at this point you will go oh damn that should not be NULL, and realize what needs to be fixed.
if you want to check a particular variable.
up
down
move up or down the stack to look at variables in a different scope.
If you are writting a service, you can create a script that loads a core and does a bt and info locals and emails you. For example:
# See if there was a crash
corefile=/var/crash/corefiles/core.${PROCESS}.$UID.$pid
if [ -f $corefile ]; then
gdb $bindir/${PROCESS} $corefile < $basedir/sh/gdb_script > mail.out
mail -s "${PROCESS} crashed" me@email.com <>
rm mail.out
fi
Where gdb_script just contains:
backtrace
info locals
Sometimes (especially with C++) the stack gets corrupted and provides no information. Now you have to either resort to printfs in the code, interactive GDB or use electric-fence/valgrind.
Logging
If you are writting a non-trivial program, chances are you have a log file that you write to. The more useful information you have in the log, the less hit and miss you have to do to narrow down the problem. A big part of this usefulness is simply choosing unique and searchable strings for your output, and formatting the information in ways that make it relatively easy to grep for. Some people like to put file/function/line information in the log. This can be useful, but my preference is to stick with a timestamp and a concise, unique message which includes not only a description of the event, but any id information associated linked to the event. I never have the same log string twice in a program if I can help it.
Two things that will save you hours of pain are:
1) Always expect the worst, always check for bad inputs or undefined cases and at the very least write an error message to your log file. Always check that pointers passed into a funciton are not NULL before dereferencing, always check for indices outside the array bounds, elses in if trees and defaults in case statements. In large projects the unexpected always happens and the sooner you can localise where, the better. If your program doesn't handle it, at least you KNOW about it and can fix it.
2) Write to the log file regularly. A good rule of thumb for me is whenever processing something external, changing state, or responding to some event. The more information you have about what happened leading up the the problem the better. What you don't want is repeatitive or overly verbose spam, as it can bury the real information. However so long as you use unique strings, you can usually use text processing techniques to quickly dig up what you want.
Less is better than VIM for log files
A final note on using log files: Use less to read them.
less is a command just like more, but is ironically much more powerful.
It has all the searching functionality of VIM, but is a better choice than VIM because if your log file is large (which it will be) VIM uses vast system resources (memory, tempory files) and is slow to load.
Some common shortcuts you should know [work in less and VIM]:
(shift)G - goto end of file
?(string) - search backward for (string)
/(string) - search forward for (string)
n - repeat last search (in last direction)
If you press shift+F, it will start acting like "tail -f". You can stop auto refreshing by Ctrl+C
Keep a seperate editor open that you can copy and paste important lines into. This allows you to build up a timeline of important events which will allow you to understand why the impossible has happened.