This post was originally written for the A-State CS Department wiki on March 22, 2014. It provides a beginner’s overview of using gdb
.
So, your program crashed...
While testing a program involving pointers, it is common to experience a program crash related to a bad pointer value somewhere in the code. On Linux systems, the message displayed when your program crashes is often “Segmentation Fault” — meaning the actual “crime” your code committed (as judged by the OS) was to try to access memory outside the “segment” in which it was allocated. Without getting to far into the details of how operating systems handle memory, suffice it to say that this shouldn’t have happened unless you have mis-handled a pointer (or an array access) in some way.
Segmentation faults and other crashes related to pointer mis-handling
can be tricky to track down using program tracing techniques (like
adding “debug output”) alone. To find the source of these problems, it
really helps to be able to “see” exactly what the program was trying
to do at the moment that the crash occurred. Debuggers are special
programs designed to help us do exactly that. One such debugger that is
available on our class virtual servers is the GNU Project Debugger,
known as “GDB” and accessed through the ‘gdb
’ command.
What is GDB?
The following was taken from the official GNU Project Debugger website, under the heading “What is GDB?”:
GDB, the GNU Project debugger, allows you to see what is going on ‘inside’ another program while it executes – or what another program was doing at the moment it crashed.
GDB can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act:
- Start your program, specifying anything that might affect its behavior.
- Make your program stop on specified conditions.
- Examine what has happened, when your program has stopped.
- Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.
The program being debugged can be written in Ada, C, C++, Objective-C, Pascal (and many other languages). Those programs might be executing on the same machine as GDB (native) or on another machine (remote). GDB can run on most popular UNIX and Microsoft Windows variants.
Learn by Example
GDB has a substantial user manual, and provides an extensive set of commands for controlling the debugging session. In reality, you can get a lot of benefit from GDB with only a small subset of the total available command set. This will be shown by working through an example below, given some code that is crashing with a Segmentation Fault error. The commands that will be used in this example are:
run
: Start program execution from the beginning of the program. Allows command-line arguments to be listed, and also allows basic I/O redirection.
backtrace
: If (or when) a program has crashed, this command will show trace of where you are currently (which functions you are in). The “trace” consists of the current function call stack, where each call (referred to as a “frame”) is shown in order from most recent to earliest (which will be the entry into main()
), and each is numbered for reference.
backtrace full
: Just like backtrace
, but local variables are also shown for each frame.
The Program
The program for this example will consist of a single header file
(olist_with_bug.h
) and a single source file (debugexample.cpp
).
The header file contains a class definition and implementation for a
singly-linked ordered list. The source file contains a main()
function
and some code to fill the list, print it to the screen, remove some
elements, and print it out again. The source code for each file is shown
below:
olist_with_bug.h
/**
* @file olist.h
* Defines OList Ordered Linked List
*
* OList is a minimally functional singly-linked ordered list
* designed to contain integer values.
*/
#ifndef OLLIST_H
#define OLLIST_H
#include<iostream>
#include<string>
/**
* OList is a minimally functional singly-linked list of integer
* values.
*/
class OList{
public:
// ctor, dtor
OList() : head(0) , m_count(0) { }
~OList() { erase(); }
// methods
void insert(int value); /// insert a new item
bool remove(int value); /// remove an item, if it is there
bool contains(int value) const; /// see if the list contains an item
int count() const { return m_count; } /// get the current number of items in the list
bool is_empty() const { return head == 0; } /// see if the list is empty
void erase(); /// erase all values in the list
std::ostream& print(std::ostream& stream=std::cout, std::string delim=", ") const; /// print the list to the specified stream
private:
/**
* container node for an integer and a pointer to the next item in the list.
*/
class Node {
public:
// ctor
Node(int value)
: data(value), next(0) { } /// Node requires data value to construct
// Attributes:
int data; /// data value
Node* next; /// pointer to the next Node in the list
};
Node* _find(int value) const;
Node* _find(int value, Node*& previous) const;
Node* head;
int m_count;
};
/**
* inserts a new data value in the ordered list
* @param value the new data item (an integer) to insert
*/
void OList::insert(int value){
OList::Node* previous;
OList::Node* newnode = new Node(value);
_find(value, previous);
if(!is_empty()){
newnode->next = previous->next;
previous->next = newnode;
}
else{
head = newnode;
}
m_count++;
}
/**
* remove an item from the list if it is there
* @param value item to remove
* @return true if item was found and removed, false otherwise
*/
bool OList::remove(int value){
OList::Node* previous;
OList::Node* victim = _find(value, previous);
bool found = victim != 0;
if(victim){
previous->next = victim->next;
if(victim == head){
head = victim->next;
}
delete victim;
m_count--;
}
return found;
}
/**
* returns true if the list contains the value
* @param value item to look for
* @return true if item is found in the list, false otherwise
*/
bool OList::contains(int value) const {
return _find(value) != 0;
}
/**
* erase the entire list, leaving it empty
*/
void OList::erase(){
for(OList::Node* current = head; current != 0;){
OList::Node* victim = current;
current = current->next;
delete victim;
}
m_count = 0;
head = 0;
}
/**
* print the list to the stream specified by the first parameter,
* delimited by the string given in the second parameter
* @param stream std::ostream stream to send output to
* @param delim string to place between items (defaults to comma-separated)
* @return the modified stream object is returned
*/
std::ostream& OList::print(std::ostream& stream, std::string delim) const{
for(OList::Node* current = head; current != 0; current=current->next){
stream << current->data << (current->next != 0 ? delim : "");
}
return stream;
}
/**
* (internal use) - finds location of a value in the list,
* @param value value to find
* @return pointer to the Node where `value` was found, or NULL if not found
*/
OList::Node* OList::_find(int value) const{
OList::Node* n;
return _find(value, n);
}
/**
* (internal use) - finds location of a value in the list, along with
* the node immediately to its "left" (the previous node) in the list.
* In cases where the node isn't in the list, the previous pointer will
* be set to the node immediately to the "left" of where the node should
* be located in the list. This makes the function useful for determining
* where a node should be added in the list.
* @param value value to find
* @param[out] previous pointer to the previous Node in the list (or NULL if the item's position is at head)
* @return pointer to the Node where `value` was found, or NULL if not found
*/
OList::Node* OList::_find(int value, Node*& previous)const {
OList::Node* current = head;
previous = 0;
while(current && current->data < value){
previous = current;
current = current->next;
}
return current && current->data == value ? current : 0;
}
#endif
debugexample.cpp
#include<iostream>
#include<cstdlib>
#include<ctime>
#include<string>
//#include "olist_with_bug.h"
#include "olist_with_bug.h"
int main()
{
// Un-comment the appropriate line below to
srand(time(0)); // produce different pseudo-random values on every run
// ... or ...
//srand(5); // Produce the same set of "random" values every time.
OList l;
// Add some values
for(int i = 0; i < 30; i++){
l.insert(rand() % 100);
}
std::cout << "Initial List: ";
l.print() << "\n\n";
// Remove some values:
for(int i = 0; i < 30; i++){
l.remove(rand() % 100);
}
std::cout << "Final List: ";
l.print() << "\n\n";
return 0;
}
Compile and Run
At this point, the usual thing to do is to try to compile and run the program as usual. The corresponding terminal session is shown below:
exampleuser@exampleserver:~/OList_with_error$ g++ -W -Wall -ansi -pedantic debugexample.cpp
exampleuser@exampleserver:~/OList_with_error$ ./a.out
Initial List: 3, 7, 8, 12, 17, 20, 21, 22, 24, 25, 25, 30, 33, 33, 36, 48, 53, 53, 57, 61, 63, 64, 71, 78, 82, 85, 86, 87, 90, 92
Final List: 3, 7, 8, 12, 17, 20, 21, 25, 25, 30, 33, 33, 53, 53, 57, 61, 63, 64, 71, 82, 87, 90, 92
exampleuser@exampleserver:~/OList_with_error$ ./a.out
Segmentation fault
Notice here that on the first run, everything went OK. The main program adds data values at random to the list, then tries to remove them at random... In some cases, the program may run as expected. But in other cases, the program may crash with a Segmentation fault, and it did that on the second run.
This behavior is typical of the way these sorts of errors behave in the real world. It is often the case that these errors only show up intermittently; reproducing them during testing can sometimes be difficult. A good approach is to make sure that your testing code exercises all possible “edge conditions” in the source - for an ordered list, that would mean adding to an empty list, adding a value that would be inserted at the head, and adding a value that would be inserted at the end of the list (in addition to the “normal” case of the value being added in the middle somewhere).
Compiling for Debugging
The compiler flags shown above did not produce an executable file that was ideal for debugging. To be most useful, the debugger will need information like the original source code, the line numbers, and the symbol (variable and function) names. This information is removed from the executable produced by the compiler by default (and for good reason -- it increases the output file size significantly). But since the program crashed and a debugging session would be helpful, the program should now be re-compiled in such a way that the compiler will produce a debugger-friendly executable.
Since the debugger used in this example is GDB, the compiler flag
-ggdb
will be added to the g++
command (the -g
flag tells the
compiler to produce debugging information, and the -ggdb
flag says to
produce that information optimized for the GDB debugger). The full
command is:
g++ -W -Wall -ansi -pedantic -ggdb debugexample.cpp
No additional compiler output is shown on the console window, but the
size of the executable file a.out
has increased (on the system being
used to produce this example, it was ~13KiB before and ~40KiB after).
The Debugging Session
Now that the program has been compiled with debugging information
included, a debugging session can be initiaited. The command to start
GDB is ‘gdb
’, and the program’s executable name should be passed as
a command-line argument. The full command to start debugging this
example (whose executable is named a.out
) is shown below:
gdb a.out
This assumes that a.out
is in the current working directory. The
actual terminal session (with the command and its output) is shown
below:
exampleuser@exampleserver:~/OList_with_error$ gdb a.out
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/exampleuser/OList_with_error/a.out...done.
(gdb)
At this point, GDB has loaded the executable file and is ready for a
command. The (gdb)
at the end is GDB’s own command prompt. First
thing’s first - run the program to try to reproduce the error:
(gdb) run
Starting program: /home/exampleuser/OList_with_error/a.out
Program received signal SIGSEGV, Segmentation fault.
0x08048969 in OList::insert (this=0xbffff6b4, value=35) at olist_with_bug.h:65
65 newnode->next = previous->next;
(gdb)
In this case, the error occurred on the first try. If it had not, the
run
command could be repeated until the crash occurred.
Notice that the function call where the Segmentation fault occurred is
shown (OList::insert()
) and the actual parameters are also shown. At
this point it is clear that the object being operated on (this
) is
located at address 0xbffff6b4, and the value passed to the value
parameter was 35. There is nothing obviously wrong with either of those
actual parameters, so continue looking...
The line of code that was actually being executed when the SIGSEGV
occurred was line 65 of olist_with_bug.h
, which contains the code:
newnode->next = previous->next;
So, something bad happened on that line of code. Notice that there are
two pointer dereferences (by way of the ->
operator) on this line.
Either of those could be to blame. We need to know what values are
contained in the newnode
and previous
pointers. To find out, the
command backtrace full
can be used to show the function call stack
as well as the values of any local variables along the way:
(gdb) backtrace full
#0 0x08048969 in OList::insert (this=0xbffff6b4, value=35) at olist_with_bug.h:65
previous = 0x0
newnode = 0x804c038
#1 0x08048c63 in main () at debugexample.cpp:20
i = 3
l = {head = 0x804c008, m_count = 3}
(gdb)
From this output, it is clear that previous
is NULL (contains the
value 0x0, which is zero expressed as hexidecimal). Therefore, it was
the attempt to dereference previous
(in previous->next
) that caused
the fault. Dereferencing a NULL pointer is always illegal.
So the question is: “How did previous
come to be NULL in the first
place?”
To answer this, we can consult the source code itself. The code from the
insert()
function is shown below:
void OList::insert(int value){
OList::Node* previous;
OList::Node* newnode = new Node(value);
_find(value, previous);
if(!is_empty()){
newnode->next = previous->next; // how is `previous` NULL here?
previous->next = newnode;
}
else{
newnode->next = head;
head = newnode;
}
m_count++;
}
Looking at the path taken to arrive at that particular line, we see that
previous
is intended to receive its value on the line:
_find(value, previous);
Which is line 63 in the file. This means that the value being placed in
previous
by the _find()
method is NULL. So the next thing to do is
examine the _find()
method to see if there is a bug there, or perhaps
it is just being used incorrectly. The _find()
method (along with its
documentation) is:
/**
* (internal use) - finds location of a value in the list, along with
* the node immediately to its "left" (the previous node) in the list.
* In cases where the node isn't in the list, the previous pointer will
* be set to the node immediately to the "left" of where the node should
* be located in the list. This makes the function useful for determining
* where a node should be added in the list.
* @param value value to find
* @param[out] previous pointer to the previous Node in the list (or NULL if the item's position is at head)
* @return pointer to the Node where `value` was found, or NULL if not found
*/
OList::Node* OList::_find(int value, Node*& previous)const {
OList::Node* current = head;
previous = 0;
while(current && current->data < value){
previous = current;
current = current->next;
}
return current && current->data == value ? current : 0;
}
From reading the method’s documentation, it seems that previous
will
indeed be set to NULL if the item’s position is at the head of the
list. This makes sense — there is nothing to the logical “left” of
the head of a singly-linked list, after all.
So it probably isn’t an error that the _find()
method is setting
previous
to NULL; it just means that the item that is being inserted
belongs to the logical “left” of the current head of the list (thus
becoming the new head item). Has this situation been properly handled
in the insert()
method? Take another look:
void OList::insert(int value){
OList::Node* previous;
OList::Node* newnode = new Node(value);
_find(value, previous);
if(!is_empty()){
newnode->next = previous->next;
previous->next = newnode;
}
else{
head = newnode;
}
m_count++;
}
Now that it is clear that previous
will be set to NULL any time the
new node belongs at the head of the list, it is worth listing the
situations in which that will be the case:
- the list is empty, meaning that any item inserted will become the head
or
- the item being inserted is smaller than the item currently at head, and should become the new head of the list
Examining the code carefully shows that the “empty list” case has been
planned for, but the “new smallest item” case has not. In either
of these cases, the new node’s next pointer should be pointed to the
current head (which may be NULL if the list is empty, but that is not
a problem). Then, the head pointer must be pointed to the new node
(the item being inserted becomes the first item). In both cases, the
previous
pointer will be set to NULL by the call to _find()
, so that
can be used as the condition in the if
, replacing the check for
“not empty”:
if(previous != 0){ // replaces "if(!is_empty()){"
The else
clause must also be modified to properly set the new
node’s next pointer (for the “new first item” case):
else{
newnode->next = head; // link in `newnode` in case list is non-empty
head = newnode;
}
This means that the modifed insert()
function should look like:
void OList::insert(int value){
OList::Node* previous;
OList::Node* newnode = new Node(value);
_find(value, previous);
if(previous != 0){ // if list is non-empty and new node isn't a minimum
newnode->next = previous->next;
previous->next = newnode;
}
else{
newnode->next = head; // link in `newnode` in case list is non-empty
head = newnode;
}
m_count++;
}
Now it is once again time to compile and test the code. The GDB debugging session is still running, so it should be stopped before re-compiling:
(gdb) quit
A debugging session is active.
Inferior 1 [process 5617] will be killed.
Quit anyway? (y or n) y
exampleuser@exampleserver:~/OList_with_error$
The last line shown indicates that the shell prompt is back and waiting for a command. The compile (with debugger output) and test session (testing directly in GDB just in case) is shown below:
exampleuser@exampleserver:~/OList_with_error$ g++ -W -Wall -ansi -pedantic -ggdb debugexample.cpp
exampleuser@exampleserver:~/OList_with_error$ gdb a.out
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/exampleuser/OList_with_error/a.out...done.
(gdb) run
Starting program: /home/exampleuser/OList_with_error/a.out
Initial List: 4, 7, 10, 11, 15, 20, 24, 25, 29, 35, 35, 36, 37, 39, 39, 41, 42, 48, 56, 57, 65, 67, 68, 73, 73, 82, 82, 86, 89, 96
Final List: 4, 7, 10, 11, 15, 20, 25, 29, 35, 35, 37, 39, 39, 48, 56, 57, 65, 68, 73, 73, 82, 89, 96
[Inferior 1 (process 6529) exited normally]
(gdb) run
Starting program: /home/exampleuser/OList_with_error/a.out
Initial List: 0, 3, 13, 20, 20, 22, 23, 33, 34, 35, 38, 44, 46, 48, 49, 53, 58, 60, 62, 64, 68, 70, 70, 74, 81, 90, 90, 94, 95, 99
Final List: 0, 3, 13, 20, 20, 22, 23, 33, 34, 35, 38, 44, 46, 48, 49, 53, 60, 62, 68, 70, 70, 74, 81, 90, 99
[Inferior 1 (process 6533) exited normally]
(gdb) run
Starting program: /home/exampleuser/OList_with_error/a.out
Initial List: 1, 6, 6, 7, 9, 11, 12, 13, 14, 15, 16, 17, 18, 24, 25, 31, 34, 38, 42, 44, 57, 65, 66, 68, 69, 69, 70, 73, 90, 94
Final List: 1, 6, 6, 7, 9, 11, 14, 16, 17, 18, 24, 31, 34, 44, 65, 66, 73, 90, 94
[Inferior 1 (process 6534) exited normally]
(gdb) run
Starting program: /home/exampleuser/OList_with_error/a.out
Initial List: 1, 6, 14, 15, 16, 17, 19, 26, 34, 37, 38, 39, 40, 42, 43, 47, 50, 50, 51, 52, 57, 58, 63, 63, 69, 69, 70, 72, 80, 98
Final List: 1, 6, 17, 19, 26, 34, 37, 38, 39, 40, 42, 43, 47, 50, 50, 51, 57, 58, 63, 69, 69, 70, 72, 98
[Inferior 1 (process 6537) exited normally]
(gdb) run
Starting program: /home/exampleuser/OList_with_error/a.out
Initial List: 1, 4, 6, 12, 16, 17, 18, 21, 22, 27, 37, 42, 42, 43, 46, 48, 50, 51, 51, 56, 57, 61, 65, 74, 77, 92, 94, 94, 95, 96
Program received signal SIGSEGV, Segmentation fault.
0x080489d5 in OList::remove (this=0xbffff6b4, value=1) at olist_with_bug.h:85
85 previous->next = victim->next;
(gdb)
As you can see, the program ran well four times before crashing with another Segmentation fault on the fifth run!
This error is in a different place (line 85 in the remove()
function).
Once again, the actual parameters look reasonable. There are two
pointers being dereferenced in that line, so a backtrace with local
variable values might help shed some light on what happened:
(gdb) backtrace full
#0 0x080489d5 in OList::remove (this=0xbffff6b4, value=1) at olist_with_bug.h:85
previous = 0x0
victim = 0x804c0c8
found = true
#1 0x08048d29 in main () at debugexample.cpp:27
i = 29
l = {head = 0x804c0c8, m_count = 20}
(gdb)
Once again, the previous
pointer seems to be to blame, containing a
NULL value at the time it is being dereferenced. Back to the source
code! The remove()
function looks like:
bool OList::remove(int value){
OList::Node* previous;
OList::Node* victim = _find(value, previous);
bool found = victim != 0;
if(victim){
previous->next = victim->next;
if(victim == head){
head = victim->next;
}
delete victim;
m_count--;
}
return found;
}
And once again the _find()
helper method is being used. From the
earlier experience, it is clear that previous
will be set to NULL if
the value being searched for isn’t found or if it is found at the
head of the list. A quick look a the documentation for the _find()
method’s return value indicates that if the value is not found, the
return value is also NULL. That leaves only one situation: The value
being searched for must be at the beginning of the list.
Like before, it seems that the code is handling the “value not found” and “empty list” special case just fine (in fact, those cases are the same when trying to remove a value). It is the special case where the value is at the beginning of the list that is being ignored. In that case, the item at the head of the list is being removed... The code is already checking for this on the line:
if(victim == head){
All that is needed is to take advantage of this already-existing check
to specify that the previous
pointer should only be accessed when
the “victim” node is not the head node. This is most easily
accomplished by adding an else
clause for that if
statement and
moving the line causing the crash into it. The modified remove()
function looks like:
bool OList::remove(int value){
OList::Node* previous;
OList::Node* victim = _find(value, previous);
bool found = victim != 0;
if(victim){
if(victim == head){
head = victim->next;
}
else{
previous->next = victim->next; // only if `victim` is not `head`!
}
delete victim;
m_count--;
}
return found;
}
After making the code adjustment, stop the GDB session using the quit
command, then re-compile and test the program. At this point the program
no longer crashes under any sequence of data additions and deletions!
The Segmentation fault errors have been tracked down and squashed thanks
to some help from GDB.
Don’t forget to re-compile without the additional debugging information if you are intending to distribute your object/executable file (to reduce the file size).
Other Useful GDB Commands
This example only made use of the run
and backtrace full
commands
from GDB; it is compelling that only two commands can help speed the
correction of subtle coding errors so dramatically, but sometimes these
two just aren’t quite enough. Adding just a few more commands to your
GDB arsenal will allow you to debug a majority of simple (common)
programming errors of all kinds:
run
: Start program execution from the beginning of the program. Allows command-line arguments to be listed, and also allows basic I/O redirection.
backtrace
: If (or when) a program has crashed, this command will show trace of where you are currently (which functions you are in). The “trace” consists of the current function call stack, where each call (referred to as a “frame”) is shown in order from most recent to earliest (which will be the entry into main()
), and each is numbered for reference.
backtrace full
: Just like backtrace
, but local variables are also shown for each frame.
list
: List the source code surrounding the current execution point, or in the currently selected frame.
print
: Prints the value contained in a variable, or in some cases the result returned by a function that has been called in the current frame. print
also has several options available to select the desired representation of the value.
break
: create a breakpoint, which is a point at which execution of the program will stop during your debugging session so that you can examine local variables, etc.
step
: “steps into” next line of code - will drop into a function call
next
: like step
but will not drop into a function call
continue
: continue running the program as usual, until next breakpoint is encountered or program exits