This week I presented a live webinar titled “Linux Fast Boot: Techniques for Aggressive Boot Time Reduction”. One of the slides talked about strace, and in the presentation, I referred to strace as my favorite tool. Indeed, strace is easy to use, readily available on virtually every platform, and provides much more than just system call tracing, which was it’s first job. One reason why strace is so popular is that strace is one of the few tools that can provide very useful diagnostic and profiling data on an application for which source code is not available. In fact, the author of the man page states that “In some cases, strace output has proven to be more readable than the source.”
In it’s simplest form, strace is used to invoke an application, and reports every system call invoked by the process and each signal received by the process until the process exits. This information is displayed in real time on stderr. Obviously, that means it seriously degrades performance, but in many cases, the issue you may be chasing is readily apparent using strace. Have you ever launched a Linux process and have it just die without providing any error messages nor log output? I pull out strace as the first order of business in these cases. Many times, the file or condition that the failing application requires is readily apparent. strace has many options to control its behavior. For example, if you suspect a failure due to a missing or nonexistent file, strace can be run to filter its output showing only open calls:
$ strace -e trace=open your-application
But strace can be used for much more. The -c flag to strace is used to provide a summary report of each system call including the time, number of times called, and the number of errors recorded for each system call. In my Linux fast boot webinar, I pointed out that most developers would be surprised by the number of superfluous system calls that actually return errors. They are not exactly “errors” in the sense that a programmer made a code error, rather they are usually a result of these utilities being made generic and adaptable to a wide variety of environments. For example, the ssh client on my Ubuntu 12.04 build machine generates 88 system call errors each time it is invoked. 47 errors on open, 20 on access, 12 on stat and a few more. Looking deeper, the failures on open are due to the fact that ssh searches for a variety of different certs and keys as part of its authentication method, the idea being that any of them would satisfy the authentication requirement. Only after trying several different cert and key names does it finally find my rsa key file. If my goal were to optimize the startup time of this application, the first thing I would do would be to find a way to preconfigure it to look for only a single cert file, and eliminate all those ENOENTs on open. (Maybe there already is a way, I wouldn’t be surprised.)
Other cool strace tricks
Do you want to discover where the instruction pointer is at the time of the system call?
$ strace -i your-program
Or maybe you want to determine which system calls take the longest time:
$ strace -r your-program
Or maybe you want to learn how much time was spent in each system call:
$ strace -T your-program
The man page for strace provides details on all the capabilities and options of this powerful diagnostic and profiling tool.
Watch my Fast Boot Webinar
You can watch the recorded version of my webinar here:
If you just want the slides, go here: