The TotalView debugger can be used for debugging both serial and parallel (MPI, OpenMP) applications. However, parallel program users will find TotalView extremely useful due to its focus on multi-processor programs debugging. It contains both a graphical and a command line interface; and it includes several features for MPI and OpenMP debugging.
- Mahone, Glooscap, Placentia, Fundy: fully supported (head node and compute nodes)
- Replay Engine only available at Mahone, Glooscap
- Users are advised to have a
.tvdrcfile in their home directory, which contains recommended settings for debugging Open MPI applications. This will configure TotalView to skip
mpirunand jump right into your MPI application, otherwise it will stop deep in the machine code of
mpirunitself, which is not what most users want. The file
.tvdrcgets generated automatically when you load the
$ module load totalview
- If you already have
.tvdrcin your home directory, it will not be overwritten. If you want to ensure to that you got the right file, you may need to remove the existing one and then reload the modulefile.
- TotalView Technologies homepage
- TotalView Support, Documentation, Video Tutorials, Tips & Tricks
- Getting Started with TotalView (Video)
- Printable PDF Documentation
- TotalView Tutorial from Lawrence Livermore National Laboratory
- TotalView Release Notes
Compiling the code
In order to provide necessary symbolic debug information for a debugger, you need to recompile your code. Usually, this requires the
-g flag to your compiler.
$ mpif90 -g -o test test.f90
When trying to do memory debugging across nodes, you need to link in the Totalview library
tvheap to your code. In order to do this, your compile line would look like
$ mpif77 -g -ltvheap_64 source.f77
- Graphical Interface:
- Command Line Interface:
If you want to use the GUI-based TotalView parallel debugger then you need to make sure that you are connecting to the head node of the cluster with the X11 forwarding enabled in your SSH client. That will allow you to get windows of a remotely started application shown on your own desktop. Unix users need to run the X11 server on their desktops (if you are running any window manager then you already have the X11 server installed) and connect to the head node with the
-X option for the SSH client (
ssh -X servername.ace-net.ca). Those using PuTTY on Windows need to install XMing and enable X11 forwarding in PuTTY.
Debugging Open MPI programs
You can use the Totalview debugger either on the head node, or through the grid engine queues. To debug a job you just need to include
--debug in the command line. Open MPI will automatically invoke TotalView to run your MPI process if you have
totalview module loaded in your shell profile.
On the head node
If your application is not computationally intensive, does not use a lot of memory, and you are running debugging sessions for short periods of time (any process run on the head node should not consume more than 15 minutes of CPU time) with a small number of processes (no more than 2), then you can debug your program on the head node. For example:
$ mpirun --debug -np 2 my_parallel_application
Debugging a serial or OpenMP job:
$ totalview my_program
On the compute nodes (through the grid engine queues)
If your debugging sessions do not qualify to run on the head node, then you need to use dedicated test.q resources, which allow to run a job for less than 1 hour.
$ qrsh -cwd -pe "ompi*" 4 -l h_rt=00:30:00,test=true mpirun --debug myapplication
If you are debugging large jobs, and require more slots than what
test.q can provide, then you can request free slots for an interactive job in the production
short.q queue. If free resources are available, they will be granted to you.
$ qrsh -cwd -pe "ompi*" 20 -l h_rt=00:30:00 mpirun --debug myapplication
Debugging a serial or OpenMP job:
$ qrsh -cwd -l h_rt=00:30:00,test=true totalview myapplication
Read more about running interactive jobs.
Debugging an MPI code
If you are debugging an MPI code, please be aware of the following situation might occur, which may lead you to believe that there is a problem with the debugger. Once you have launched your code in the debugger, and have answered "yes" in the dialog window to stop the parallel job, you will need to set a break point somewhere below the MPI_Init() call and then click "Go". If you do not set a break point below MPI_Init() and just click "Next", then you will get a message "Waiting to reach location" that does not go away, until you cancel it. If you set a break point before MPI_Init(), then the debugger will ignore it.
There is an issue with the pgCC compiler, where Totalview doesn't correctly match the code line it is executing with the code line it is displaying as executing. Note that some MPI wrappers (mpiCC, mpicxx or mpic++) that use the pgCC compiler will also suffer from this issue. To debug with the pgCC compiler and the MPI wrappers use the Portland group debugger PGDBG.
There is currently an issue with programs that are compiled using the Sun compiler. When you try and debug programs that are compiled with the Sun compiler, Totalview can't get the values for variables in the program. This problem doesn't exist if you use the Portland compiler or the GNU compiler.