Is hidden runtime output harming your program performance?

A readily overlooked performance cost for realtime applications is the logging of textual information to a file or console. On several occasions in my career during performance profiling and analysis I have discovered uncontrolled output logging code deep within a module that was having a measurable performance impact. Usually this becomes most problematic when the text output is being performed in a loop or in a high frequency function call. Such output can cost in C++ even when it is directed to a null output stream, as will be explained below.

The (potential) problem

First of all I should state what I mean by “logging”. For C/C++ applications I mean any use of textual output to an output stream, such as a file or a console, typically using printf or ostream.

Logging code in complex applications can be critical during application development and testing to track program flow and debug error conditions. The use of logging levels is very common, where the code may define various levels of severity of logging information such as “Info”, “Warning”, “Error”, and “Fatal”.  Textual output with associated severity levels then allows the output to be filtered by severity level according to the use case of the logged text. Often logging will be sent to a file, or to the standard output (which may be a console or connection to a debugger, for example).

Log filtering is usually accomplished dynamically at runtime by checking the logging level of a particular logging request with the currently enabled log level flags. If a logging call is made at an enabled logging level, then that call is forwarded to the relevant system output functions.

There are two distinct costs to performing a logging call:

  1. The cost of composing the output text buffer from the output function input arguments. This all depends on how complex the output expression is, and the type of arguments being expressed.
  2. Once the text output buffer is composed, the cost of driving the output to the output device which is external to the executing program (e.g. a file on a storage device, or a console run by the operating system).

When using C++ streaming syntax “device << my_stuff << etc…”, the cost of composing the output character buffer is essentially unbounded, since operator << could be overloaded for arbitrary types and can be of arbitrary complexity.  While output using the C standard printf is also potentially unbounded, since an output string supplied to %s could be arbitrary large, all other arguments are native data types, and complexity in printf output generation is more transparent, not being wrapped in the syntactic sugar of C++’s streaming operators.

The cost of driving buffered text output to an output device will depend on a) the volume buffered character text to be output, and b) the speed of the output device.

Performance issues

The obvious issue is that it is possible for the cost of textual logging to become a significant part of the cost of an iteration of your runtime main loop. This will typically arise when the number of logging calls is large (even if each output text chunk is small), especially when logging occurs in high frequency functions or inside loops with many iterations.  It is common for programmers to expect that text output is almost cost-free, and this can be the case when it is well controlled – but it is not a safe assumption to make unconditionally. Like dynamic memory allocation, logging can be a highly distributed cost that adds up across the whole program flow, but may not be concentrated in one particular branch of the call tree.

It is also worth pausing to think through the task that has to be performed to generate text output for the native C++ data types. Apart from arguments of type char*, all other types need to be reinterpreted from the native representation, consisting of 1, 2, 4 or 8 byes of base-2 binary into a string of characters, possibly of base-2 or base-8, but most often base-10.  So the signed integer value of -12345678, represented in hexadecimal as the sequence of 4 bytes [B2 9E 43 FF]  (little-endian), has to get somehow transformed into the corresponding series of 9 text bytes “-12345678”, or hexadecimal bytes [45 49 50 51 52 53 54 55 56] if the encoding is ASCII. This is a highly non-trivial task, as can be appreciated by perusing some actual source code for printf (see reference [1]).

So care should be taken that textual logging is implemented in such a way that the composition of the output text buffer is bypassed completely for any disabled log level. In particular, we want to avoid to situation where the application performs all the work required to generate buffered output text for disable log levels, as then we lose all the benefit of having a system of logging level filtering.

Note that when text is logged to a nullified device, such as a dead console or when there is just no standard output device present, the device output cost will be essentially zero, but the cost of generating the output text buffer may not be zero if the code structure dictates that output objects must be procedurally generated before calling the system output functions.  This is the worst-case “hidden” performance issue you need to beware of, referred to in the title – you don’t see any actual standard output, yet the application is still incurring the cost of generating output strings.

Embedded implementation

For a controlled embedded logging system, consider the following approaches:

  • If you have no need to customize logging level per module, or else dynamically at runtime, consider using preprocessor macros to make each disabled logging level compile to nothing for release builds.
  • If you take the more versatile approach of having log level filtering at runtime, make sure the filtering is fully evaluated before any output text, and that output text is only evaluated for log levels that are actually enabled. This should be as simple as wrapping the output call in a conditional block predicated on a bit mask, where each bit would represent the enable state of each log level.

There are many off the shelf logging libraries as well, but be sure to test that they really meet your requirements and expectations. See also [2] for techniques for minimizing the binary instruction invasiveness of logging code.

Conclusion

It is highly recommended practice in realtime embedded applications to ensure that only required logging output is composed and output to a device. When it comes time for a serious release build, or any build where representative realtime performance is a requirement, then it is important that textual logging is carefully controlled is such as way as to have negligible performance impact. For a production release of an embedded product where routine status logging is not required except for the most serious failure conditions, then unused logging code should compile away to nothing.

References

[1] Sample source code for vfprintf.

[2] Pimp my log()

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s