DWARF 是一种调试信息格式,全称 Debugging With Attributed Record Formats。它不是压缩算法,而是一种标准,用来描述程序的结构、变量、函数、行号等信息,方便调试器(如 gdb
)或性能分析工具(如 perf
)做符号解析和调用栈展开。
DWARF 的作用
- 当你用
gcc
或clang
编译时加-g
,编译器会在目标文件和可执行文件里生成 DWARF 调试信息。 - 这些信息包括:
- 源文件名、行号(用于断点、回溯)
- 函数名、变量名
- 调用关系(用于
perf --call-graph dwarf
展开调用栈)
为什么 perf 需要 DWARF?
perf
采样时只拿到指令地址,要想显示函数名和调用链,需要知道地址对应的源代码位置。- 如果用
--call-graph dwarf
,perf
会利用 DWARF 的 unwind 信息 来还原完整调用栈,比传统的帧指针(fp
)方式更准确,尤其在编译器优化后帧指针被省略的情况下。
DWARF 与其他方式对比
方式 | 优点 | 缺点 |
---|---|---|
fp(frame pointer) | 开销低 | 需要 -fno-omit-frame-pointer ,优化受限 |
dwarf | 不依赖帧指针,准确率高 | 开销大,采样频率需降低 |
lbr(last branch record) | 开销低,硬件支持 | 仅部分 CPU 支持,栈深有限 |
如何解决 PERF 时unknown的问题
If you see [unknown]
, the following might be useful assuming you're using --call-stack dwarf
:
-
Note that
--call-graph dwarf,1024
(where1024
is the stack size) dumps first 1024 bytes of the stack to the record file, then use DWARF debug information to deduce the frames later.This is quite inefficient (because it dumps the whole stack instead of just the addresses for each sample), but more importantly, if the stack is too deep (1024 bytes is insufficient) then the result is [unknown] frames.
So, either try increasing it to
dwarf,65528
(which is the maximum on my machine), or if it still doesn't work,--call-graph lbr
or--call-graph fp
(the last one may need recompile).
Other things to try:
-
echo 0 |sudo tee /proc/sys/kernel/kptr_restrict
(suggested in https://users.rust-lang.org/t/flamegraph-shows-every-caller-is-unknown/52408) -
perf map file must be owned by the correct user (https://stackoverflow.com/a/39662781/5267751)
-
sudo sysctl kernel.perf_event_paranoid=-1
(perf script
would print this out if needed) -
Flame graph can be generated by
perf script flamegraph -F 100 ./program args
but you can also explicitly separate the recording and the reporting step (to aid with debugging) by
perf record --call-graph lbr -F 100 ./program args perf script report flamegraph
Questions that needs to be cleaned up:
- https://stackoverflow.com/questions/27842281/unknown-events-in-nodejs-v8-flamegraph-using-perf-events
- https://stackoverflow.com/questions/10933408/how-can-i-get-perf-to-find-symbols-in-my-program
- https://stackoverflow.com/questions/68259699/how-can-you-get-frame-pointer-perf-call-stacks-flamegraphs-involving-the-c-sta
- https://unix.stackexchange.com/questions/276179/missing-stack-symbols-with-perf-events-perf-report-despite-fno-omit-frame-poi
References:
- https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/recording-and-analyzing-performance-profiles-with-perf_monitoring-and-managing-system-status-and-performance
Notes:
- https://brendangregg.com/flamegraphs.html --- explains what is a flame chart (similar to flamegraph, but order the blocks by time instead of by size. I think
perf
doesn't support this...? Not sure if there's a way)
Other unrelated issues:
- https://stackoverflow.com/questions/51157938/perf-doesnt-add-up-to-100