← Back to home

On fixing obscure crashes

When I receive a report of a crash or an exception in an app that I’m working on, it takes me some time to recognize it as something that’s real and that I should relate to. My immediate reaction is “oh, it’s only a one time thing” or “it probably is not in the code our team has written”. I’m talking about issues with some level of indirection and obscurity, not the one where the stacktrace directly tells you how to fix it.

After procrastinating for some time, I start digging through the code base. Emotionally, the process reminds me of learning something new — very uncomfortable and with a constant desire to switch to another activity. The problem has to be force-fed to the brain first. Then the most effective way to solve it is to go and have a good night of sleep.

On the next day, as expected, the code paths are much more familiar. I usually start inventing the most inrealistic scenarios that can cause the crash. It’s very important to write down all of these theories, with any related thoughts.

A good strategy is to try simulating the crash in an isolated way. Usually it starts with some sort of not-normal usage of the application: jump around, click on all of the buttons all at once, use three apps on different devices at the same time, toggle on and off network connection — do everything and anything to break the app. When a certain pattern of this not-normal usage starts to repeatedly crash the app, you’re on the right track. Try reducing the steps to the minimum to achieve the failing result, and stop when there are just a few of them and you can still reliably reproduce the issue.

Most of the times at this point an “aha!” moment arrives, and you know what should be fixed. It’s obvious now, and it wasn’t that obvious just a few minutes ago.

In some cases when it’s still not obvious, what helps a lot is to set up a separate project from scratch that will follow the same code path without any other code around it. This strategy allows to shorten the feedback cycle. The crash will start to happen at some point, and then you end up distilling a single atomic change to the application that causes the issue.

By now it might be clear what code causes the issue, but may not be clear why. If it’s a closed-source API that you’re using, don’t be ashamed of hacking around it and applying a good old monkey patch. Remember: users do not see your code, and even if they see it they most likely don’t care about it.

Last edited on May 15, 2019