Achieve 99.5%+ crash free users consistently?’

Raviraj Desai
8 min readJul 9, 2022
Crashlytics

Every Android developer faces a fundamental truth about the app ecosystem: crashes are your worst enemy. Not only are they harder to prevent in native Android apps than in iOS, but different Android OS, device fragmentation, OEM UI customisation masks the situation even worse.

Best Development Practices for Avoiding Crashes

  1. Crash tool: To measure the impact and reason behind the crashes we were looking for a tool which will be available for both iOS and android also, it should be able to provide complete insight of it. Crashlytics (now acquired by firebase) was the tool that can solve our problem and free as well. Using Crashlytics we were able to find the reason behind the crashes (some time it just a null pointer exception) and fix them on time. Today we are having 99% crash-free users and looking for 99.9% in the coming days.
  2. Frameworks: The choice of framework helps in achieving reliability in functionality. There are many frameworks available nowadays for almost most of the functionality, these frameworks were used or tested by many developers. One thing that needs to be checked before picking any open source framework is that it should have a good reputation, last updated date(to understand continuous update over the OS versions), open issues and reply from the community members on those issues.
  3. Design (UI/UX): How design could be helpful in a crash-free app? Design is an important aspect of the app and if there are no defined rules or guidelines for the product it will create developer jobs hard to manage and maintain as they have to create new components and color sheet, font sheet with every change.
  4. Brainstorming: “You should spend 80% of your time in sharpening the Axe than cutting the tree” this phrase is 50% true for development as well. Before starting the development for any feature we should brainstorm about what and why to code, this technique helps to know work effort, reusability and future use of the functionality.
  5. Coding structure: When multiple developers/firms work on a project they use their own coding structure and standard, which is very difficult for any new developer to understand and work on it. As we started in-house development we have to create a common coding standard and development pattern across all the products.
  6. Unit testing: With manual testing, we can prevent the issue in the system and if we include unit testing in code then it makes it 100% sure that feature will not have any issue in it. It also makes developers/testers life easy as with every new PR system it self performs basic functionality check and gives an error if anything breaks.
  7. Why?: It’s a very important part of any development life cycle. Whether it’s an addition of new features or updates in existing, selecting the framework or API design before making any of these, teams should ask as many as why as they can. Why this feature is needed, why this framework is good for the product, why can’t we add some new responses in API response. All these reasons are the reason which makes a successful product.

We will talk about the following things

  • Know thy enemy
  • Top 6 reasons Android apps are crash-prone and how to avoid them
  • 99% Free Crash Users: Gold Standard
  • Goals: What all metrics to track
  • Best Development Practices for Avoiding Crashes
  • Phased Rollout

1.Know thy enemy

No matter how much you try, bugs will always exist. Similarly, for apps, crashes will always exist. What you can do is build the best development practices & processes to catch the crashes early on. But even some crash will slip through your development & QA lifecycle and best thing can be done is to actively identify the most harmful issues and work to fix them as quickly as possible

Managing a crash-free app experience should be one of the top goals for any developer.

1.Phased Rollout

Every Android developer is familiar with the nerve-wracking experience of pushing the App release button and having their app be available to thousands of users immediately. What if things break? What if the app starts crashing? There is no way to rollback. This was quite a challenge for any android developer in the early days. To solve this problem Google introduced Phased Releases.

2. Billion-dollar mistake : NullPointerException

If you are an Android developer then you most probably already know what NullPointerExceptions are. Null pointer exception is an “exception” (error) that is particularly common when programming in Java. For a beginner, this is a confusing and daunting message to receive, but it can be just as much of a headache for pros to deal with! In fact, null pointer exceptions are so common and so damning, that they are also referred to as “the billion-dollar mistake”.

How to avoid them?

The top location for Null pointer exceptions is the Android onResume() method. When the app goes to the background, it may have to get rid of some memory, losing various references along the way. To avoid this, save your data from the onPause() method into a more persistent layer like a local database or cache, then pull it back out within onResume().

There are some null safety functionalities also provided by kotlin. Using them in everyday coding definitely has helped Apps avoid NullPointerException and deliver stable experiences to our customers.

3. Multiple Android versions

Quite contrary to iOS, where 90% of the devices are on the latest iOS version. In Android, this number is close to about 13%. For any android developers they have to support their app all the way till Android 5.0(launched in 2014), some developers even have to support it till Android 4.1 ( launched in 2012). This leads to App developers dealing with complex issues like targeting older OS differently. Not having the functionalities fully supported in all android versions properly. This results in unexpected crashes in older android versions from time to time.

How to avoid them?

Fortunately testing your app versions in Android Emulators helps identify the majority of OS-related issues. You can either run the app manually in your local Emulator or use any Cloud Device Farms like Firebase or AWS.

4. Memory management

An out of memory error is thrown when there is no memory left for your app to use. Keep in mind though, the last memory allocation that triggered the error is not necessarily what caused the memory leak, which is why the stacktrace won’t help you. Instead, prior memory allocations added up, and it was only the last one that reached the threshold. Think of it as the straw that broke the camel’s back.

How to avoid them?

There’s a tool called LeakCanary that was built by the team at Square. It detects memory allocations and can spot when this scenario is occurring. LeakCanary has been billed as a memory leak detection library for Android

5. Error condition and exception handling

Given the complications of mobile development, some errors are inevitable, whether it’s an unexpected API change, a memory problem that avoided the previous detection, or a network condition that ends connectivity or even just slows data speeds during the transmission of large files or even on API calls. What stands between such a situation and a crash is the good error and exception handling.

How to avoid them?

Using proper Error conditions and exception handling, your app can’t get thrown by an unexpected attempt to divide by zero, an incorrectly entered response from a user, an API that suddenly started providing text as a response instead of a numeric value, or the temporary loss of connectivity.

In any of these cases, a properly coded within your app will handle unexpected situations and have a graceful way to terminate a process or activity while informing the user of the error. It may not be ideal, but if you can keep the lines of communication open, there’s a better chance you’ll keep the user.

6. Multiple live app versions

Mobile apps are full peace of software code that resides in the user’s phone. New updates being rolled out don’t mean that it will get updated immediately in the customer’s phone as well. Customers do not update apps immediately. While it’s getting better these days with Android’s automatic updates, it is most likely that you have to keep supporting older versions of the code. Any changes in the Backend API could potentially have an impact on app stability & crashes.

How to avoid them?

Backward compatibility of the API’s is the key here to avoid getting crashes due to multiple versions of the app. With each new API change, you need to check old versions of the app and make sure your app doesn’t break.

Google is also taking this problem seriously and in Google IO 2019 they announced an in-app update API to enable seamless update within the app. Implementing this could reduce the number of live versions of apps, thus reducing any chance of crashes.

99% Crash Free Users: Gold Standard

In the mobile ecosystem especially in Android, It’s recommended that an app should strive to have a crash-free user rate of greater than 99%. It’s the gold standard of any apps in android to have 99% Crash free users, but there are many apps where that will fall below this level at times. Customer facing Android app has an average crash-free user rate of 99.7% and while a 0.7% gain on the recommendation may seem small, it is very significant. For an application that is used by more than 1 Cr+ customers to handle their gold loans and repayments, it’s hugely important to ensure the customers trust the app to provide a consistently high-quality service.

Goals: Measure what matters

There are two different ways that we can measure crashes of the app: crash-free users and crash-free sessions.

Crash-free users is the percentage of users that have not experienced a crash in the given timeframe.

And crash-free sessions is the percentage of total sessions that have not resulted in a crash in the given timeframe.

Both are equally important to track. Crash-free users give you the visibility of how much percentage of users are affected by the crash. Crash-free sessions point you to how frequently the app is crashing.

Goal :

  • 99.5+ Crash free users
  • 99.8+ Crash free Session

One of the main reasons we’re so proud of our 99.7% crash-free users rate is because of the work we’ve done to get it there. When the app was first launched, this figure was around 97% or 98%, while that may work for small startups but as we scale, customer experience becomes more and more important day by day.

Best Development Practices for Avoiding Crashes

Code review

Preventing crashes often starts with quality code review. It is both the responsibility of the developer of a feature and the code reviewers to ensure that we aren’t introducing new crashes. This isn’t limited to just crashes today but crashes in the future. Engineers need to consider future uses of their code, or how changes to our API responses might change in the future. Defensive programming is the key to a sustainable codebase.

Always measure

We’ve been using Crashlytics to monitor crash rates, and have been tracking our progress informally over the years. All of our Mobile engineers share the responsibility to check in on the latest release, especially to see if the features they’ve built introduced any new crashes. Any new crashes identified gets picked up in the upcoming sprint and rolled out with the next app update

Final Thoughts :

Developing & maintaining Android apps is a continuous process. Benchmarking and continuous improvements is one of the main ways to ensure the success of your application. Although 99% crash-free users are good enough numbers, every developer should truly try to reach as close to 99.9% crash-free users for their app. This requires a continuous effort of many months to get closer to 99.9%.

I hope you’ve found the article interesting & useful. If so, feel free to like and comment and share your views.

--

--

Raviraj Desai

Assistant Vice President at HDFC Bank (Android — Kotlin lover. Tech stack: MVVM, Dagger2, Coroutines, RxJava2 ,Flutter Enthusiast ,Ejabber