For years now, I have been meaning to write a series of articles on the automated crash reporting system in the IMVU client. This first article will give a bit of background on the structure of the client and show how we handle Python exceptions.
At IMVU, we generally subscribe to the Fail Fast philosophy of handling errors: when the client encounters an unexpected error, we immediately crash the program and ask the user to submit a crash report. As part of the crash report, we send log files, stack traces, system information, and anything else that might help us debug the failure.
You might wonder why we crash the program whenever anything goes wrong rather than trying to catch the error and continue running. Counterintuitively, crashing the program forces us to act on crashes and immediately exposes bugs that might trigger unwanted behavior or lost data down the road.
Now let's talk a little bit about how the client is structured. The IMVU client is written primarily in Python, with time-critical components such as the 3D renderer written in C++. Since the client is a cross between a normal interactive Windows program and a real-time game, the main loop looks something like this:
def main(): while running: pumpWindowsMessages() # for 1/30th of a second updateAnimations() redrawWindows()
This structure assumes that no exceptions bubble into or out of the main loop. Let's imagine that updateAnimations() has a bug and occasionally raises an uncaught exception. If running the client with a standard command-line python invocation, the program would print the exception and stack trace to the console window and exit. That's all great, but our users don't launch our client by invoking python from the command line: we use py2exe to build a standalone executable that users ultimately run. With an unmodified py2exe application, uncaught exceptions are printed to sys.stderr (as above), except there is no console window to display the error. Thus, the py2exe bootstrap code registers a handler so that errors are logged to a file, and when the program shuts down, a dialog box shows something like "An error has been logged. Please see IMVUClient.exe.log."
From a crash reporting standpoint, this is not good enough. We can't be asking our users to manually hunt down some log files on their hard drives and mail them to us. It's just too much work - they will simply stop using our product. (Unfortunately, most of the software out there asks users to do exactly this!) We need a way for the client to automatically handle errors and prompt the users to submit the reports back to us. So let's rejigger main() a bit:
def mainLoop(): while running: pumpWindowsMessages() updateAnimations() redrawWindows() def main(): try: mainLoop() except: error_information = sys.exc_info() if OK == askUserForPermission(): submitError(error_information)
This time, if a bug in updateAnimations() raises an exception, the top-level try: except: clause catches the error and handles it intelligently. In our current implementation, we post the error report to a Bugzilla instance, where we have built custom tools to analyze and prioritize the failures in the field.
This is the main gist of how the IMVU client automatically reports failures. The next post in this series will cover automatic detection of errors in our C++ libraries.