https://nushackers.github.io/hackertools-slides/2020ht5/
Today's focus is on the tools, so you know what information you can get.
That will let you develop an intuition for how you should go about debugging issues when they arise.
Python debugger
To run:python3 -m pdb script.py
(b)reak [filename:]lineno[, condition]
: set breakpoint at line(b)reak function[, condition]
: set breakpoint at function(s)tep
: step (stop at next executed line)(n)ext
: step (stop at next executed line in this function)(r)eturn
: continue until function returns(c)ontinue
: continue execution(l)ist
: list source around current line(w)here
: print stack trace(u)p
: go to caller frame(d)own
: go to callee frame(p)rint expr
: evaluate and print expr
For more, see the manual.
Let's debug this script:
def bubble_sort(arr):
n = len(arr)
for i in range(n):
for j in range(n):
if arr[j] > arr[j+1]:
arr[j] = arr[j+1]
arr[j+1] = arr[j]
return arr
print(bubble_sort([4, 2, 1, 8, 7, 6]))
GNU debugger
To run:gdb program
To run and provide arguments to debuggee:gdb --args program arg1 arg2...
r
: run programstart
: run and break at entrypointc
: continue programbt
: show backtrace/stacktracef frameno
: go to frame frameno
(according to bt
)b [filename:]lineno [if condition]
: set breakpoint at lined breakpointno
: delete breakpointp expr
: evaluate and print expr
l
: list source around current linedisas
: list assembly around current instructions[i]
: step one line[instruction]n[i]
: step one line[instruction], but don't follow callsfinish
: run until this function returnsrecord
: start recording executionreverse-cmd
: the above stepping commands, but in reverse (use after record
)Let's debug this program:
#include <stdio.h>
int access(int *array, int index) {
return array[index];
}
int main() {
int arr[] = { 0, 1, 2 };
printf("%d\n", access(0, 1));
}
First compile it with debugging symbols enabled:gcc -g -o access access.c
Then run gdb:gdb ./access
A suite of dynamic checkers and profilers.
We will look at Valgrind's default tool, Memcheck.
Other commonly used tools include Helgrind and DRD, which detect race conditions in multithreaded programs.
Let's debug this program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
int array[] = { 5, 4, 3, 2, 1 };
int *dynamic_array = malloc(sizeof(array));
memcpy(dynamic_array, array, sizeof(array));
for (int i; i <= 5; ++i) {
printf("%d %d\n", array[i], dynamic_array[i]);
}
}
$ gcc -g -o vg vg.c
$ ./vg
5 5
4 4
3 3
2 2
1 1
29911 0
$ valgrind --leak-check=full --track-origins=yes ./vg
==30813== Memcheck, a memory error detector
==30813== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==30813== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==30813== Command: ./vg
==30813==
==30813== Conditional jump or move depends on uninitialised value(s)
==30813== at 0x109205: main (vg.c:9)
==30813== Uninitialised value was created by a stack allocation
==30813== at 0x109169: main (vg.c:5)
==30813==
==30813== Use of uninitialised value of size 8
==30813== at 0x1091DF: main (vg.c:10)
==30813== Uninitialised value was created by a stack allocation
==30813== at 0x109169: main (vg.c:5)
==30813==
==30813== Use of uninitialised value of size 8
==30813== at 0x1091E6: main (vg.c:10)
==30813== Uninitialised value was created by a stack allocation
==30813== at 0x109169: main (vg.c:5)
==30813==
5 5
4 4
3 3
2 2
1 1
==30813== Invalid read of size 4
==30813== at 0x1091DF: main (vg.c:10)
==30813== Address 0x4a50054 is 0 bytes after a block of size 20 alloc'd
==30813== at 0x483A77F: malloc (vg_replace_malloc.c:307)
==30813== by 0x1091AC: main (vg.c:7)
==30813==
==30813== Conditional jump or move depends on uninitialised value(s)
==30813== at 0x48F1663: __vfprintf_internal (in /usr/lib/libc-2.32.so)
==30813== by 0x48DCBBE: printf (in /usr/lib/libc-2.32.so)
==30813== by 0x1091FC: main (vg.c:10)
==30813== Uninitialised value was created by a stack allocation
==30813== at 0x109169: main (vg.c:5)
==30813==
==30813== Use of uninitialised value of size 8
==30813== at 0x48D6C7B: _itoa_word (in /usr/lib/libc-2.32.so)
==30813== by 0x48F0668: __vfprintf_internal (in /usr/lib/libc-2.32.so)
==30813== by 0x48DCBBE: printf (in /usr/lib/libc-2.32.so)
==30813== by 0x1091FC: main (vg.c:10)
==30813== Uninitialised value was created by a stack allocation
==30813== at 0x109169: main (vg.c:5)
==30813==
==30813== Conditional jump or move depends on uninitialised value(s)
==30813== at 0x48D6C8C: _itoa_word (in /usr/lib/libc-2.32.so)
==30813== by 0x48F0668: __vfprintf_internal (in /usr/lib/libc-2.32.so)
==30813== by 0x48DCBBE: printf (in /usr/lib/libc-2.32.so)
==30813== by 0x1091FC: main (vg.c:10)
==30813== Uninitialised value was created by a stack allocation
==30813== at 0x109169: main (vg.c:5)
==30813==
==30813== Conditional jump or move depends on uninitialised value(s)
==30813== at 0x48F12E3: __vfprintf_internal (in /usr/lib/libc-2.32.so)
==30813== by 0x48DCBBE: printf (in /usr/lib/libc-2.32.so)
==30813== by 0x1091FC: main (vg.c:10)
==30813== Uninitialised value was created by a stack allocation
==30813== at 0x109169: main (vg.c:5)
==30813==
==30813== Conditional jump or move depends on uninitialised value(s)
==30813== at 0x48F0788: __vfprintf_internal (in /usr/lib/libc-2.32.so)
==30813== by 0x48DCBBE: printf (in /usr/lib/libc-2.32.so)
==30813== by 0x1091FC: main (vg.c:10)
==30813== Uninitialised value was created by a stack allocation
==30813== at 0x109169: main (vg.c:5)
==30813==
31 0
==30813==
==30813== HEAP SUMMARY:
==30813== in use at exit: 20 bytes in 1 blocks
==30813== total heap usage: 2 allocs, 1 frees, 1,044 bytes allocated
==30813==
==30813== 20 bytes in 1 blocks are definitely lost in loss record 1 of 1
==30813== at 0x483A77F: malloc (vg_replace_malloc.c:307)
==30813== by 0x1091AC: main (vg.c:7)
==30813==
==30813== LEAK SUMMARY:
==30813== definitely lost: 20 bytes in 1 blocks
==30813== indirectly lost: 0 bytes in 0 blocks
==30813== possibly lost: 0 bytes in 0 blocks
==30813== still reachable: 0 bytes in 0 blocks
==30813== suppressed: 0 bytes in 0 blocks
==30813==
==30813== For lists of detected and suppressed errors, rerun with: -s
==30813== ERROR SUMMARY: 28 errors from 10 contexts (suppressed: 0 from 0)
"Address sanitiser". Built into GCC and Clang.
It detects some of what Valgrind's Memcheck detects:
Let's debug the same program.
$ gcc -no-pie -fno-omit-frame-pointer -fsanitize=address -O1 -g -o vg vg.c
$ ./vg
5 5
4 4
3 3
2 2
1 1
=================================================================
==31908==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000000024 at pc 0x000000401438 bp 0x7d39cddeb2f0 sp 0x7d39cddeb2e0
READ of size 4 at 0x603000000024 thread T0
#0 0x401437 in main vg.c:10
#1 0x6f22467d8151 in __libc_start_main (/usr/lib/libc.so.6+0x28151)
#2 0x40111d in _start (vg+0x40111d)
0x603000000024 is located 0 bytes to the right of 20-byte region [0x603000000010,0x603000000024)
allocated by thread T0 here:
#0 0x6f2246a2b459 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0x401338 in main vg.c:7
#2 0x6f22467d8151 in __libc_start_main (/usr/lib/libc.so.6+0x28151)
SUMMARY: AddressSanitizer: heap-buffer-overflow vg.c:10 in main
Shadow bytes around the buggy address:
0x0c067fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c067fff8000: fa fa 00 00[04]fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==31908==ABORTING
Undefined behaviour sanitiser, also built in GCC and Clang.
It detects some undefined behaviour.
Let's debug this program:
#include <stdio.h>
int main() {
char *bytes = "Hello world!";
printf("%d %d\n", 5 << bytes[5], *(int *)(bytes + 1));
}
$ gcc -no-pie -fno-omit-frame-pointer -fsanitize=undefined -O1 -g -o ubsan ./ubsan.c
$ ./ubsan
ubsan.c:5:5: runtime error: load of misaligned address 0x000000402005 for type 'int', which requires 4 byte alignment
0x000000402005: note: pointer points here
00 02 00 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 25 64 20 25 64 0a 00 2e 2f 75 62 73 61 6e 2e 63
^
ubsan.c:5:25: runtime error: shift exponent 32 is too large for 32-bit type 'int'
0 1869376613
Valgrind is essentially dynamically recompiling your program. It incurs about a 20x–30x slowdown.
ASan, UBSan (and other sanitisers) incur "only" a few times slowdown (rather than multiples of 10);
System call tracer
strace program args...
$ strace cat non-existent-file
execve("/usr/bin/cat", ["cat", "non-existent-file"], 0x73f8f8736588 /* 54 vars */) = 0
brk(NULL) = 0x9a3ecd41255
arch_prctl(0x3001 /* ARCH_??? */, 0x7d7c84280b90) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=241330, ...}) = 0
mmap(NULL, 241330, PROT_READ, MAP_PRIVATE, 3, 0) = 0x632e36a02000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\202\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32, 848) = 32
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\207\360\21\247\344\314?\306\nT\320\323\335i\16t"..., 68, 880) = 68
fstat(3, {st_mode=S_IFREG|0755, st_size=2159552, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x632e36a00000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
mmap(NULL, 1868448, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x632e36837000
mmap(0x632e3685d000, 1363968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x632e3685d000
mmap(0x632e369aa000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x173000) = 0x632e369aa000
mmap(0x632e369f6000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x632e369f6000
mmap(0x632e369fc000, 12960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x632e369fc000
close(3) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x632e36835000
arch_prctl(ARCH_SET_FS, 0x632e36a015c0) = 0
mprotect(0x632e369f6000, 12288, PROT_READ) = 0
mprotect(0x9a3eb003000, 4096, PROT_READ) = 0
mprotect(0x632e36a6f000, 4096, PROT_READ) = 0
munmap(0x632e36a02000, 241330) = 0
brk(NULL) = 0x9a3ecd41255
brk(0x9a3ecd62255) = 0x9a3ecd62255
brk(0x9a3ecd63000) = 0x9a3ecd63000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=3045984, ...}) = 0
mmap(NULL, 3045984, PROT_READ, MAP_PRIVATE, 3, 0) = 0x632e3654d000
close(3) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x2), ...}) = 0
openat(AT_FDCWD, "non-existent-file", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "cat: ", 5cat: ) = 5
write(2, "non-existent-file", 17non-existent-file) = 17
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2998, ...}) = 0
read(3, "# Locale name alias data base.\n#"..., 3072) = 2998
read(3, "", 3072) = 0
close(3) = 0
openat(AT_FDCWD, "/usr/share/locale/en_SG.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_SG.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_SG/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, ": No such file or directory", 27: No such file or directory) = 27
write(2, "\n", 1
) = 1
close(1) = 0
close(2) = 0
exit_group(1) = ?
+++ exited with 1 +++
Library call tracer
ltrace program args...
Let's trace this program:
#include <stdio.h>
int main() {
printf("Hello world!\n");
return 0;
}
$ gcc -o hello hello.c
$ ltrace ./hello
puts("Hello world!"Hello world!
) = 13
+++ exited (status 0) +++
Some distributions compile their programs with an optimisation -fno-plt
that causes ltrace
to break.
An alternative is uftrace
(available in Debian/Ubuntu and AUR).
A shell builtin that just times a given command.
Let's profile this program:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
const int iter = atoi(argv[1]);
int c;
for (int i = 0; i < iter; ++i) {
for (int j = 0; j < 100000; ++j) {
c += i * j;
}
}
printf("%d\n", c);
return 0;
}
$ gcc -g -O2 -o time time.c
$ time ./time 1000
1640597472
./time 1000 0.05s user 0.00s system 99% cpu 0.047 total
$ time ./time 10000
873921568
./time 10000 0.26s user 0.00s system 99% cpu 0.259 total
$ time ./time 100000
681331360
./time 100000 2.39s user 0.00s system 99% cpu 2.392 total
$ for i in {1..10}; do time ./time 10000; done
-351732704
./time 10000 0.26s user 0.00s system 99% cpu 0.262 total
270228512
./time 10000 0.25s user 0.00s system 99% cpu 0.249 total
-272614368
./time 10000 0.25s user 0.00s system 99% cpu 0.249 total
12766240
./time 10000 0.25s user 0.00s system 99% cpu 0.248 total
147876896
./time 10000 0.25s user 0.00s system 99% cpu 0.251 total
577096736
./time 10000 0.25s user 0.00s system 99% cpu 0.250 total
-1691562976
./time 10000 0.24s user 0.00s system 99% cpu 0.239 total
-326898656
./time 10000 0.24s user 0.00s system 99% cpu 0.239 total
-733873120
./time 10000 0.24s user 0.00s system 99% cpu 0.245 total
-1463952352
./time 10000 0.25s user 0.00s system 99% cpu 0.252 total
A versatile low-overhead Linux performance measurement tool.
Let's profile time
.
$ perf stat ./time 50000
-200296928
Performance counter stats for './time 50000':
1,239.74 msec task-clock # 0.999 CPUs utilized
4 context-switches # 0.003 K/sec
0 cpu-migrations # 0.000 K/sec
56 page-faults # 0.045 K/sec
5,039,600,054 cycles # 4.065 GHz (83.30%)
1,382,167 stalled-cycles-frontend # 0.03% frontend cycles idle (83.30%)
3,510,477 stalled-cycles-backend # 0.07% backend cycles idle (83.31%)
20,015,313,490 instructions # 3.97 insn per cycle
# 0.00 stalled cycles per insn (83.30%)
4,999,283,046 branches # 4032.528 M/sec (83.46%)
78,502 branch-misses # 0.00% of all branches (83.32%)
1.240383573 seconds time elapsed
1.237088000 seconds user
0.000000000 seconds sys
Let's profile time
.
$ perf record -F 1000 -g ./time 50000
$ perf report
Let's repeat perf stat/record/report with this program:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]) {
const int c = atoi(argv[1]);
FILE *urandom = fopen("/dev/urandom", "r");
char *buf = malloc(c * 4096);
size_t sum = 0;
for (int i = 0; i < c; ++i) {
fread(buf, c * 4096, 1, urandom);
for (int j = 0; j < c * 4096; j += sizeof(int)) {
sum += *(int *)(buf + j);
}
}
printf("%zu\n", sum);
return 0;
}
$ gcc -g -O2 -o perf perf.c
$ perf stat ./perf 400
$ perf record -F 1000 -g ./perf 400
$ perf report
A tool to generate... flamegraphs.
Let's generate a flamegraph from our perf recording.
$ perf script > out.perf
$ ./stackcollapse-perf.pl out.perf > out.folded
$ ./flamegraph.pl out.folded > out.svg
$ ./flamegraph.pl --reverse out.folded > out-rev.svg
Benchmarking tool
Let's benchmark the program we used with time
earlier.
$ hyperfine -S bash -w 3 './time 50000'
$ hyperfine -S bash -w 3 -P count 10000 50000 -D 10000 './time {count}'
Note that the tools we covered are language-agnostic.
perf
works for languages that compile to native code (C, C++, Rust, etc.).
Languages that do not compile to native code (Python, JS, .NET, Java, etc.) will have their own tools.
The other common type of profiling is memory usage profiling.
We won't cover that today, as those are typically language-specific.
htop is a tool to view processes and their resource usage.
iotop is a tool to specifically view process IO usage.
iotop -Po
dstat is a tool to view overall system resource usage.
free shows system memory usage.
df shows the amount of free space on the disks in the system.
du shows the total space used by files.
fuser shows which processes are using the specified files.
lsof shows open files.
ss shows open sockets.
stress lets you stress test your system.
s-tui shows CPU clocks, device temperatures and fan speeds.
journalctl shows you your system log.
This is useful for general system troubleshooting.
dmesg shows the kernel log.
Generally, kernel log output is included in journalctl.
We looked at basic debugging, automated program checkers, basic profiling, system monitoring, and some other tools.
Debugging, profiling and troubleshooting comes down to knowing your tools well and then developing an intuition on what or where to look.
You have some tools to start off with; now go and fix things and make things fast!
Feedback
bit.ly/2020ht5