Communications of the ACM,
Vol. 59 No. 6, Pages 48-57
An everyday problem in our industry is understanding how software is consuming resources, particularly CPUs. What exactly is consuming how much, and how did this change since the last software version? These questions can be answered using software profilers—tools that help direct developers to optimize their code and operators to tune their environment. The output of profilers can be verbose, however, making it laborious to study and comprehend. The flame graph provides a new visualization for profiler output and can make for much faster comprehension, reducing the time for root cause analysis.
In environments where software changes rapidly, such as the Netflix cloud microservice architecture, it is especially important to understand profiles quickly. Faster comprehension can also make the study of foreign software more successful, where one's skills, appetite, and time are strictly limited.
The following letter was published in the Letters to the Editor in the August 2016 CACM (http://cacm.acm.org/magazines/2016/8/205034).
The emphasis on visualizing large numbers of stack samples, as in, say, flame graphs in Brendan Gregg's article "The Flame Graph" (June 2016) actually works against finding some performance bottlenecks, resulting in sub-optimal performance of the software being tuned. Any such visualization must necessarily discard information, resulting in "false negatives," or failure to identify some bottlenecks. For example, time can be wasted by lines of code that happen to be invoked in numerous places in the call tree. The call hierarchy, which is what flame graphs display, cannot draw attention to these lines of code.(1) Moreover, one cannot assume the bottlenecks can be ignored; even a particular bottleneck that starts small does not stay small, on a percentage basis, after other bottlenecks have been removed. Gregg made much of 60,000 samples and how difficult they are to visualize. However, he also discussed finding and fixing a bottleneck that resulted in saving 40% of execution time. That means the fraction of samples displaying the bottleneck was at least 40%. The bottleneck would thus have been displayed, with statistical certainty, in a purely human examination of 10 or 20 random stack samples with no need for 60,000. This is generally true of any bottleneck big enough, on a percentage basis, to be worth fixing; moreover, every bottleneck grows as others are removed. So, if truly serious performance tuning is being done, it is not necessary or even helpful to visualize thousands of samples.
Michael R. Dunlavey
More samples provide more benefits. One is that performance wins of all magnitudes can be accurately quantified and compared, including even the 1%, 2%, and 3% wins, finding more wins and wasting less engineering time investigating false negatives. Samples are cheap. Engineering time is not. Another benefit is quantifying the full code flow, illustrating more tuning possibilities. There are other benefits, too. As for code invoked in numerous places, my article discussed two techniques for identifying them searching and merging top-down.
(1) Dunlavey, M.R. Unknown events in nodejs/v8flamegraph using perf_events; http://stackoverflow.com/a/27867426/23771
Displaying 1 comment
Log in to Read the Full Article
Purchase the Article
Create a Web Account
If you are an ACM member, Communications subscriber, Digital Library subscriber, or use your institution's subscription, please set up a web account to access premium content and site
features. If you are a SIG member or member of the general public, you may set up a web account to comment on free articles and sign up for email alerts.