Jason L Causey

Fixing Pointer-Related Segmentation Faults using GDB

This post was originally written for the A-State CS Department wiki on March 22, 2014. It provides a beginner’s overview of using gdb.

So, your program crashed...

While testing a program involving pointers, it is common to experience a program crash related to a bad pointer value somewhere in the code. On Linux systems, the message displayed when your program crashes is often “Segmentation Fault” — meaning the actual “crime” your code committed (as judged by the OS) was to try to access memory outside the “segment” in which it was allocated. Without getting to far into the details of how operating systems handle memory, suffice it to say that this shouldn’t have happened unless you have mis-handled a pointer (or an array access) in some way.

Segmentation faults and other crashes related to pointer mis-handling can be tricky to track down using program tracing techniques (like adding “debug output”) alone. To find the source of these problems, it really helps to be able to “see” exactly what the program was trying to do at the moment that the crash occurred. Debuggers are special programs designed to help us do exactly that. One such debugger that is available on our class virtual servers is the GNU Project Debugger, known as “GDB” and accessed through the ‘gdb’ command.

What is GDB?

The following was taken from the official GNU Project Debugger website, under the heading “What is GDB?”:

GDB, the GNU Project debugger, allows you to see what is going on ‘inside’ another program while it executes – or what another program was doing at the moment it crashed.

GDB can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act:

The program being debugged can be written in Ada, C, C++, Objective-C, Pascal (and many other languages). Those programs might be executing on the same machine as GDB (native) or on another machine (remote). GDB can run on most popular UNIX and Microsoft Windows variants.

Learn by Example

GDB has a substantial user manual, and provides an extensive set of commands for controlling the debugging session. In reality, you can get a lot of benefit from GDB with only a small subset of the total available command set. This will be shown by working through an example below, given some code that is crashing with a Segmentation Fault error. The commands that will be used in this example are:

run : Start program execution from the beginning of the program. Allows command-line arguments to be listed, and also allows basic I/O redirection.
backtrace : If (or when) a program has crashed, this command will show trace of where you are currently (which functions you are in). The “trace” consists of the current function call stack, where each call (referred to as a “frame”) is shown in order from most recent to earliest (which will be the entry into main()), and each is numbered for reference.
backtrace full : Just like backtrace, but local variables are also shown for each frame.

The Program

The program for this example will consist of a single header file (olist_with_bug.h) and a single source file (debugexample.cpp). The header file contains a class definition and implementation for a singly-linked ordered list. The source file contains a main() function and some code to fill the list, print it to the screen, remove some elements, and print it out again. The source code for each file is shown below:

olist_with_bug.h

/**
 * @file olist.h
 * Defines OList Ordered Linked List
 *
 * OList is a minimally functional singly-linked ordered list
 * designed to contain integer values.
 */

#ifndef OLLIST_H
#define OLLIST_H

#include<iostream>
#include<string>

/**
 * OList is a minimally functional singly-linked list of integer
 * values.  
 */
class OList{
    public:
        // ctor, dtor
        OList()     : head(0) , m_count(0)  {                   }                       
        ~OList()                            { erase();          }                       

        // methods
        void          insert(int value);                                                    /// insert a new item
        bool          remove(int value);                                                    /// remove an item, if it is there
        bool          contains(int value) const;                                            /// see if the list contains an item
        int           count() const         { return m_count;   }                           /// get the current number of items in the list
        bool          is_empty() const      { return head == 0; }                           /// see if the list is empty
        void          erase();                                                              /// erase all values in the list
        std::ostream& print(std::ostream& stream=std::cout, std::string delim=", ") const;  /// print the list to the specified stream

    private:
        /**
         * container node for an integer and a pointer to the next item in the list.
         */
        class Node {
            public:
                // ctor
                Node(int value) 
                    : data(value), next(0)  { }                                             /// Node requires data value to construct
                
                // Attributes:
                int   data;                                                                 /// data value
                Node* next;                                                                 /// pointer to the next Node in the list
        };

        Node* _find(int value) const;
        Node* _find(int value, Node*& previous) const;

        Node* head;
        int   m_count;
};

/**
 * inserts a new data value in the ordered list
 * @param value the new data item (an integer) to insert
 */
void OList::insert(int value){
    OList::Node* previous;
    OList::Node* newnode = new Node(value);
    _find(value, previous);
    if(!is_empty()){
        newnode->next  = previous->next;
        previous->next = newnode;
    }
    else{
        head           = newnode;
    }
    m_count++;
}

/**
 * remove an item from the list if it is there
 * @param  value item to remove
 * @return       true if item was found and removed, false otherwise
 */
bool OList::remove(int value){
    OList::Node* previous;
    OList::Node* victim = _find(value, previous);
    bool  found  = victim != 0;
    if(victim){
        previous->next = victim->next;
        if(victim == head){
            head = victim->next;
        }
        delete victim;
        m_count--;
    }
    return found;
}

/**
 * returns true if the list contains the value
 * @param  value item to look for
 * @return       true if item is found in the list, false otherwise
 */
bool OList::contains(int value) const {
    return _find(value) != 0;
}

/**
 * erase the entire list, leaving it empty
 */
void OList::erase(){
    for(OList::Node* current = head; current != 0;){
        OList::Node* victim = current;
        current             = current->next;
        delete victim;
    }
    m_count = 0;
    head    = 0;
}

/**
 * print the list to the stream specified by the first parameter, 
 * delimited by the string given in the second parameter
 * @param  stream std::ostream stream to send output to
 * @param  delim  string to place between items (defaults to comma-separated)
 * @return        the modified stream object is returned
 */
std::ostream& OList::print(std::ostream& stream, std::string delim) const{
    for(OList::Node* current = head; current != 0; current=current->next){
        stream << current->data << (current->next != 0 ? delim : "");
    }
    return stream;
}

/**
 * (internal use) - finds location of a value in the list, 
 * @param  value value to find
 * @return       pointer to the Node where `value` was found, or NULL if not found
 */
OList::Node* OList::_find(int value) const{
    OList::Node* n;
    return _find(value, n);
}

/**
 * (internal use) - finds location of a value in the list, along with
 * the node immediately to its "left" (the previous node) in the list.
 * In cases where the node isn't in the list, the previous pointer will
 * be set to the node immediately to the "left" of where the node should
 * be located in the list.  This makes the function useful for determining
 * where a node should be added in the list.
 * @param           value    value to find
 * @param[out]      previous pointer to the previous Node in the list (or NULL if the item's position is at head)
 * @return                   pointer to the Node where `value` was found, or NULL if not found
 */
OList::Node* OList::_find(int value, Node*& previous)const {
    OList::Node* current = head;
    previous             = 0;
    while(current && current->data < value){
        previous = current;
        current  = current->next;
    }
    return current && current->data == value ? current : 0;
}

#endif

debugexample.cpp

#include<iostream>
#include<cstdlib>
#include<ctime>
#include<string>

//#include "olist_with_bug.h"
#include "olist_with_bug.h"


int main()
{
    // Un-comment the appropriate line below to
    srand(time(0));     // produce different pseudo-random values on every run
    //  ... or ... 
    //srand(5);           // Produce the same set of "random" values every time.
    OList l;
    
    // Add some values
    for(int i = 0; i < 30; i++){
        l.insert(rand() % 100);
    }
    std::cout << "Initial List: ";
    l.print() << "\n\n";

    // Remove some values:
    for(int i = 0; i < 30; i++){
        l.remove(rand() % 100);
    }

    std::cout << "Final List: ";
    l.print() << "\n\n";

    return 0;
}

Compile and Run

At this point, the usual thing to do is to try to compile and run the program as usual. The corresponding terminal session is shown below:

exampleuser@exampleserver:~/OList_with_error$ g++ -W -Wall -ansi -pedantic debugexample.cpp 
exampleuser@exampleserver:~/OList_with_error$ ./a.out 
Initial List: 3, 7, 8, 12, 17, 20, 21, 22, 24, 25, 25, 30, 33, 33, 36, 48, 53, 53, 57, 61, 63, 64, 71, 78, 82, 85, 86, 87, 90, 92

Final List: 3, 7, 8, 12, 17, 20, 21, 25, 25, 30, 33, 33, 53, 53, 57, 61, 63, 64, 71, 82, 87, 90, 92

exampleuser@exampleserver:~/OList_with_error$ ./a.out 
Segmentation fault

Notice here that on the first run, everything went OK. The main program adds data values at random to the list, then tries to remove them at random... In some cases, the program may run as expected. But in other cases, the program may crash with a Segmentation fault, and it did that on the second run.

This behavior is typical of the way these sorts of errors behave in the real world. It is often the case that these errors only show up intermittently; reproducing them during testing can sometimes be difficult. A good approach is to make sure that your testing code exercises all possible “edge conditions” in the source - for an ordered list, that would mean adding to an empty list, adding a value that would be inserted at the head, and adding a value that would be inserted at the end of the list (in addition to the “normal” case of the value being added in the middle somewhere).

Compiling for Debugging

The compiler flags shown above did not produce an executable file that was ideal for debugging. To be most useful, the debugger will need information like the original source code, the line numbers, and the symbol (variable and function) names. This information is removed from the executable produced by the compiler by default (and for good reason -- it increases the output file size significantly). But since the program crashed and a debugging session would be helpful, the program should now be re-compiled in such a way that the compiler will produce a debugger-friendly executable.

Since the debugger used in this example is GDB, the compiler flag -ggdb will be added to the g++ command (the -g flag tells the compiler to produce debugging information, and the -ggdb flag says to produce that information optimized for the GDB debugger). The full command is:

g++ -W -Wall -ansi -pedantic -ggdb debugexample.cpp

No additional compiler output is shown on the console window, but the size of the executable file a.out has increased (on the system being used to produce this example, it was ~13KiB before and ~40KiB after).

The Debugging Session

Now that the program has been compiled with debugging information included, a debugging session can be initiaited. The command to start GDB is ‘gdb’, and the program’s executable name should be passed as a command-line argument. The full command to start debugging this example (whose executable is named a.out) is shown below:

gdb a.out

This assumes that a.out is in the current working directory. The actual terminal session (with the command and its output) is shown below:

exampleuser@exampleserver:~/OList_with_error$ gdb a.out
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/exampleuser/OList_with_error/a.out...done.
(gdb) 

At this point, GDB has loaded the executable file and is ready for a command. The (gdb) at the end is GDB’s own command prompt. First thing’s first - run the program to try to reproduce the error:

(gdb) run
Starting program: /home/exampleuser/OList_with_error/a.out 

Program received signal SIGSEGV, Segmentation fault.
0x08048969 in OList::insert (this=0xbffff6b4, value=35) at olist_with_bug.h:65
65          newnode->next  = previous->next;
(gdb) 

In this case, the error occurred on the first try. If it had not, the run command could be repeated until the crash occurred.

Notice that the function call where the Segmentation fault occurred is shown (OList::insert()) and the actual parameters are also shown. At this point it is clear that the object being operated on (this) is located at address 0xbffff6b4, and the value passed to the value parameter was 35. There is nothing obviously wrong with either of those actual parameters, so continue looking...

The line of code that was actually being executed when the SIGSEGV occurred was line 65 of olist_with_bug.h, which contains the code:

newnode->next  = previous->next;

So, something bad happened on that line of code. Notice that there are two pointer dereferences (by way of the -> operator) on this line. Either of those could be to blame. We need to know what values are contained in the newnode and previous pointers. To find out, the command backtrace full can be used to show the function call stack as well as the values of any local variables along the way:

(gdb) backtrace full
#0  0x08048969 in OList::insert (this=0xbffff6b4, value=35) at olist_with_bug.h:65
        previous = 0x0
        newnode = 0x804c038
#1  0x08048c63 in main () at debugexample.cpp:20
        i = 3
        l = {head = 0x804c008, m_count = 3}
(gdb) 

From this output, it is clear that previous is NULL (contains the value 0x0, which is zero expressed as hexidecimal). Therefore, it was the attempt to dereference previous (in previous->next) that caused the fault. Dereferencing a NULL pointer is always illegal.

So the question is: “How did previous come to be NULL in the first place?”

To answer this, we can consult the source code itself. The code from the insert() function is shown below:

void OList::insert(int value){
    OList::Node* previous;
    OList::Node* newnode = new Node(value);
    _find(value, previous);
    if(!is_empty()){
        newnode->next  = previous->next;        // how is `previous` NULL here?
        previous->next = newnode;
    }
    else{
        newnode->next  = head;
        head           = newnode;
    }
    m_count++;
}

Looking at the path taken to arrive at that particular line, we see that previous is intended to receive its value on the line:

    _find(value, previous);

Which is line 63 in the file. This means that the value being placed in previous by the _find() method is NULL. So the next thing to do is examine the _find() method to see if there is a bug there, or perhaps it is just being used incorrectly. The _find() method (along with its documentation) is:

/**
 * (internal use) - finds location of a value in the list, along with
 * the node immediately to its "left" (the previous node) in the list.
 * In cases where the node isn't in the list, the previous pointer will
 * be set to the node immediately to the "left" of where the node should
 * be located in the list.  This makes the function useful for determining
 * where a node should be added in the list.
 * @param       value    value to find
 * @param[out]  previous pointer to the previous Node in the list (or NULL if the item's position is at head)
 * @return               pointer to the Node where `value` was found, or NULL if not found
 */
OList::Node* OList::_find(int value, Node*& previous)const {
    OList::Node* current = head;
    previous             = 0;
    while(current && current->data < value){
        previous = current;
        current  = current->next;
    }
    return current && current->data == value ? current : 0;
}

From reading the method’s documentation, it seems that previous will indeed be set to NULL if the item’s position is at the head of the list. This makes sense — there is nothing to the logical “left” of the head of a singly-linked list, after all.

So it probably isn’t an error that the _find() method is setting previous to NULL; it just means that the item that is being inserted belongs to the logical “left” of the current head of the list (thus becoming the new head item). Has this situation been properly handled in the insert() method? Take another look:

void OList::insert(int value){
    OList::Node* previous;
    OList::Node* newnode = new Node(value);
    _find(value, previous);                  
    if(!is_empty()){
        newnode->next  = previous->next;        
        previous->next = newnode;
    }
    else{
        head           = newnode;
    }
    m_count++;
}

Now that it is clear that previous will be set to NULL any time the new node belongs at the head of the list, it is worth listing the situations in which that will be the case:

or

Examining the code carefully shows that the “empty list” case has been planned for, but the “new smallest item” case has not. In either of these cases, the new node’s next pointer should be pointed to the current head (which may be NULL if the list is empty, but that is not a problem). Then, the head pointer must be pointed to the new node (the item being inserted becomes the first item). In both cases, the previous pointer will be set to NULL by the call to _find(), so that can be used as the condition in the if, replacing the check for “not empty”:

    if(previous != 0){      // replaces "if(!is_empty()){"

The else clause must also be modified to properly set the new node’s next pointer (for the “new first item” case):

    else{
        newnode->next  = head;      // link in `newnode` in case list is non-empty
        head           = newnode;
    }

This means that the modifed insert() function should look like:

void OList::insert(int value){
    OList::Node* previous;
    OList::Node* newnode = new Node(value);
    _find(value, previous);                  
    if(previous != 0){             // if list is non-empty and new node isn't a minimum
        newnode->next  = previous->next;        
        previous->next = newnode;
    }
    else{
        newnode->next  = head;     // link in `newnode` in case list is non-empty
        head           = newnode;
    }
    m_count++;
}

Now it is once again time to compile and test the code. The GDB debugging session is still running, so it should be stopped before re-compiling:

(gdb) quit
A debugging session is active.

    Inferior 1 [process 5617] will be killed.

Quit anyway? (y or n) y
exampleuser@exampleserver:~/OList_with_error$

The last line shown indicates that the shell prompt is back and waiting for a command. The compile (with debugger output) and test session (testing directly in GDB just in case) is shown below:

exampleuser@exampleserver:~/OList_with_error$ g++ -W -Wall -ansi -pedantic -ggdb debugexample.cpp
exampleuser@exampleserver:~/OList_with_error$ gdb a.out
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/exampleuser/OList_with_error/a.out...done.
(gdb) run
Starting program: /home/exampleuser/OList_with_error/a.out 
Initial List: 4, 7, 10, 11, 15, 20, 24, 25, 29, 35, 35, 36, 37, 39, 39, 41, 42, 48, 56, 57, 65, 67, 68, 73, 73, 82, 82, 86, 89, 96

Final List: 4, 7, 10, 11, 15, 20, 25, 29, 35, 35, 37, 39, 39, 48, 56, 57, 65, 68, 73, 73, 82, 89, 96

[Inferior 1 (process 6529) exited normally]
(gdb) run
Starting program: /home/exampleuser/OList_with_error/a.out 
Initial List: 0, 3, 13, 20, 20, 22, 23, 33, 34, 35, 38, 44, 46, 48, 49, 53, 58, 60, 62, 64, 68, 70, 70, 74, 81, 90, 90, 94, 95, 99

Final List: 0, 3, 13, 20, 20, 22, 23, 33, 34, 35, 38, 44, 46, 48, 49, 53, 60, 62, 68, 70, 70, 74, 81, 90, 99

[Inferior 1 (process 6533) exited normally]
(gdb) run
Starting program: /home/exampleuser/OList_with_error/a.out 
Initial List: 1, 6, 6, 7, 9, 11, 12, 13, 14, 15, 16, 17, 18, 24, 25, 31, 34, 38, 42, 44, 57, 65, 66, 68, 69, 69, 70, 73, 90, 94

Final List: 1, 6, 6, 7, 9, 11, 14, 16, 17, 18, 24, 31, 34, 44, 65, 66, 73, 90, 94

[Inferior 1 (process 6534) exited normally]
(gdb) run
Starting program: /home/exampleuser/OList_with_error/a.out 
Initial List: 1, 6, 14, 15, 16, 17, 19, 26, 34, 37, 38, 39, 40, 42, 43, 47, 50, 50, 51, 52, 57, 58, 63, 63, 69, 69, 70, 72, 80, 98

Final List: 1, 6, 17, 19, 26, 34, 37, 38, 39, 40, 42, 43, 47, 50, 50, 51, 57, 58, 63, 69, 69, 70, 72, 98

[Inferior 1 (process 6537) exited normally]
(gdb) run
Starting program: /home/exampleuser/OList_with_error/a.out 
Initial List: 1, 4, 6, 12, 16, 17, 18, 21, 22, 27, 37, 42, 42, 43, 46, 48, 50, 51, 51, 56, 57, 61, 65, 74, 77, 92, 94, 94, 95, 96


Program received signal SIGSEGV, Segmentation fault.
0x080489d5 in OList::remove (this=0xbffff6b4, value=1) at olist_with_bug.h:85
85          previous->next = victim->next;
(gdb) 

As you can see, the program ran well four times before crashing with another Segmentation fault on the fifth run!

This error is in a different place (line 85 in the remove() function). Once again, the actual parameters look reasonable. There are two pointers being dereferenced in that line, so a backtrace with local variable values might help shed some light on what happened:

(gdb) backtrace full
#0  0x080489d5 in OList::remove (this=0xbffff6b4, value=1) at olist_with_bug.h:85
        previous = 0x0
        victim = 0x804c0c8
        found = true
#1  0x08048d29 in main () at debugexample.cpp:27
        i = 29
        l = {head = 0x804c0c8, m_count = 20}
(gdb) 

Once again, the previous pointer seems to be to blame, containing a NULL value at the time it is being dereferenced. Back to the source code! The remove() function looks like:

bool OList::remove(int value){
    OList::Node* previous;
    OList::Node* victim = _find(value, previous);
    bool  found  = victim != 0;
    if(victim){
        previous->next = victim->next;
        if(victim == head){
            head = victim->next;
        }
        delete victim;
        m_count--;
    }
    return found;
}

And once again the _find() helper method is being used. From the earlier experience, it is clear that previous will be set to NULL if the value being searched for isn’t found or if it is found at the head of the list. A quick look a the documentation for the _find() method’s return value indicates that if the value is not found, the return value is also NULL. That leaves only one situation: The value being searched for must be at the beginning of the list.

Like before, it seems that the code is handling the “value not found” and “empty list” special case just fine (in fact, those cases are the same when trying to remove a value). It is the special case where the value is at the beginning of the list that is being ignored. In that case, the item at the head of the list is being removed... The code is already checking for this on the line:

        if(victim == head){

All that is needed is to take advantage of this already-existing check to specify that the previous pointer should only be accessed when the “victim” node is not the head node. This is most easily accomplished by adding an else clause for that if statement and moving the line causing the crash into it. The modified remove() function looks like:

bool OList::remove(int value){
    OList::Node* previous;
    OList::Node* victim = _find(value, previous);
    bool  found  = victim != 0;
    if(victim){
        if(victim == head){
            head = victim->next;
        }
        else{
            previous->next = victim->next;  // only if `victim` is not `head`!
        }
        delete victim;
        m_count--;
    }
    return found;
}

After making the code adjustment, stop the GDB session using the quit command, then re-compile and test the program. At this point the program no longer crashes under any sequence of data additions and deletions! The Segmentation fault errors have been tracked down and squashed thanks to some help from GDB.

Don’t forget to re-compile without the additional debugging information if you are intending to distribute your object/executable file (to reduce the file size).

Other Useful GDB Commands

This example only made use of the run and backtrace full commands from GDB; it is compelling that only two commands can help speed the correction of subtle coding errors so dramatically, but sometimes these two just aren’t quite enough. Adding just a few more commands to your GDB arsenal will allow you to debug a majority of simple (common) programming errors of all kinds:

run : Start program execution from the beginning of the program. Allows command-line arguments to be listed, and also allows basic I/O redirection.
backtrace : If (or when) a program has crashed, this command will show trace of where you are currently (which functions you are in). The “trace” consists of the current function call stack, where each call (referred to as a “frame”) is shown in order from most recent to earliest (which will be the entry into main()), and each is numbered for reference.
backtrace full : Just like backtrace, but local variables are also shown for each frame.
list : List the source code surrounding the current execution point, or in the currently selected frame.
print : Prints the value contained in a variable, or in some cases the result returned by a function that has been called in the current frame. print also has several options available to select the desired representation of the value.
break : create a breakpoint, which is a point at which execution of the program will stop during your debugging session so that you can examine local variables, etc.
step : “steps into” next line of code - will drop into a function call
next : like step but will not drop into a function call
continue : continue running the program as usual, until next breakpoint is encountered or program exits