99.99% Crash-free users – How did we do it?

Briony Jones


7 min read

At Nutmeg, we know that being able to keep up to date with your investments wherever you are is incredibly important to our customers. It’s one of the reasons we have a dedicated team of developers working on our iOS and Android apps, looking for ways to improve and develop them. We’re very proud of how we’ve rewritten the native iOS app to reach a near perfect average crash-free user rate – here’s more on how we’ve done it.

It’s recommended that an app should strive to have a crash-free user rate of greater than 99%[1], but there are many apps where that will fall below this level at times. The Nutmeg native iOS app has an average crash-free user rate of 99.99% and while a 0.9% gain on the recommendation may seem small, it is very significant. For an application that is used by more than 30,000 iOS customers to handle their money and their investments, it’s hugely important to ensure the customers trust the app to provide a consistently high-quality service.

How we measure app crashes

There are two different ways that we can measure crashes of the app: crash-free users and crash-free sessions. Crash-free users is the percentage of users that have not experienced a crash in the given timeframe; and crash-free sessions is the percentage of total sessions that have not resulted in a crash in the given timeframe.

You would expect to see a higher crash-free sessions rate, since the total number of sessions will be much higher than the total number of users, and you factor in multiple sessions per user over a given timeframe.

For this blog, we’re looking at crash-free users, where the ‘crash rate’ refers to the number of users who experienced a crash. By looking at the users, rather than the sessions, to better show the experience of customers.

It’s not always been so high

One of the main reasons we’re so proud of our 99.99% crash-free users rate is because of the work we’ve done to get it there. When the app was first launched, this figure was around 97% or 98%, which – while not uncommon for new apps – was something that we wanted to investigate and improve for our customers.

With the help of the Crashlytics monitoring tool, we quickly identified that our use and implementation of the Core Data framework was causing most of the crashes we were seeing.

Core Data

Core Data is a persistence framework provided by Apple, that can be extremely powerful and useful if implemented correctly and purposefully. When we launched version one of the app, a simple single-threaded implementation of the networking and persistence layer using core data was implemented. We would:
1. Make a request to retrieve data from the API (via Networking layer)
2. Use factory methods to transform the resulting data to the NSManagedObject and store in Core Data
3. Read the data from the Core Data and use/display in the views

This implementation worked well for the first release, with a small feature set and few data requirements. However as soon as the app started to grow, with new features and requirements being added, it started to cause issues due to the mutable NSManagedObjects potentially being both written to and read at the same time, and the crash rate started creeping up.

Some improvements were initially made by implementing a multi-threaded environment, using multiple NSPersistentStoreCoordinators[2] to mitigate the problems we were seeing when interacting with the data. This significantly improved our crash rate, however not as much as we wanted. And one of the problems with Core Data is that the crashes can be hard to reproduce and debug and therefore hard to solve.

While Core Data is beneficial for persistence, long-term persistence is not really necessary for the Nutmeg app. For customers checking their portfolios we want to show up-to-date data – there’s no benefit in showing them their portfolio value from the last time they opened the application two weeks ago, if in the meantime the value has changed significantly. If for some reason we are unable to show them live data, or at least very up-to-date data, then we are better of showing an error screen and encouraging them to come back and try again later – no data is better than incorrect data in this situation.

The app after Core Data

So, we decided to remove Core Data altogether. Undertaking a large refactor of the app, we made use of protocols and custom defined data sources to implement a repository pattern and in-memory caching layer which had:
• Configurable expiration times per entity. For example, the User object is unlikely to change very often so we can have a longer expiration time on that. Whereas for the portfolio value we want to always have the latest possible data, as it changes often due to payments and trading etc, so it would have a much shorter expiration date.
• A configurable storage policy. This allows each entity to specify whether to use the API or cache for given requests and dependent on its requirements. It will retrieve data from the in-memory cache if it is required, it exists, and it is valid (meaning it has been updated within the given expiration time), otherwise it will make a network request to the API to retrieve.
• Use immutable entity objects instead of the mutable NSManagedObject used by Core Data. Immutable models are inherently thread safe, making the application more concurrent and multi-threaded, especially important as it grows.

The entity model looks something like this:

And the repository looks something like this:

The only real drawback to this implementation was that we were no longer able to offer an offline mode to customers without the persistence of Core Data. However, we decided this was a worthwhile compromise, based on the available functionality of the app where very few features would be useful without an online connection. The benefits of this new design pattern, including a reduced crash rate as well as improved scalability and better suiting our business needs, outweighed this drawback.

Improved architecture

As with most iOS Swift projects, when the application was first built it was built using Apple’s recommended architecture of MVC – Model-View-Controller[3]. Again, this worked well initially, but as the feature set grew and the application grew we experienced – as many others do – that MVC becomes Massive-View-Controllers; huge view controller classes that contain the majority of the code, making it unreadable and unmanageable, and it can be incredibly hard to unit test well.

To combat this, we introduced MVVM – Model-View-View-Model[4], which provides much more granular separation of the business logic (which lives in the ViewModel) from the user interface (UI) layer. This allows us to much better test the business logic through unit tests. Increased unit test coverage can greatly help keep the crash rate down, by ensuring all scenarios and edge cases are covered. In the past 18 months we have migrated nearly all of our legacy MVC code to MVVM, and our unit test coverage alone has increased by nearly 30%.

No more interface builder

Interface builder is a feature of Xcode, the primary Integrated Development Environment (IDE) used by Swift developers. It allows us to design parts of the app as storyboards and XIB files with a visual representation instead of writing code.

The components in the interface builder are then connected to the .swift file through use of IBOutlets and IBActions; variables and functions which reference the component in the interface builder. On the right-hand side of the screenshot below you can see all the connected references of a single screen.

One of the most common causes of crashes is a disconnected outlet – a reference which is still present in the interface builder, but no longer in the .swift file. This is not caught by the compiler and causes a runtime exception when the app is launched and the view is loaded.

In the past year we’ve moved towards an approach of creating views programmatically, which can initially take a bit more development time, but can reduce the chances of crashes caused by disconnected outlets. Additionally, we’ve found this method is far more reusable and therefore efficient in the long run.

We have also almost entirely removed storyboard files from our project through the adoption of the Coordinator pattern to handle the navigation within the app instead. Storyboards make use of static String identifiers, called Segue identifiers, which are difficult to identify at compile time when missing or misspelt, and so can lead to runtime exceptions when the navigation from one screen to another occurs in the app.

SwiftLint

In Swift, there are some language features that are legitimate but can be dangerous when used incorrectly, such as Optional Variables and force unwrapping[5]. Invalid uses of such features can cause crashes due to the fact they are not caught at compile time, and instead throw an exception at runtime, resulting in a crash.

We discourage developers using such features and instead to code more defensively. To help with this this we use SwiftLint[6] – a tool that allows you to configure warnings and errors on certain coding styles to enforce best practice coding.

So, why not 100%?

With all these measures and the improvements we’ve made you’d have thought we could have ruled out crashes entirely. But a 100% crash-free rate is nigh-on impossible. There are a couple of reasons for this. First, we use Third Party libraries within the application, and crashes can occur within these where we have no control over the code. Second, we have a finite data set for testing. At Nutmeg we have over 80,000 customers. Each customer account will have its own data set, dependent on the configuration of their account – for example, the number and type of pots, the style and risk level – and this data will vary hugely – how long they’ve been invested, rate of payments and withdrawals, and the market effects on their portfolio over that time. It is not feasible to test every single scenario for every single customer, and so sometimes there can be edge cases caused by the data we receive or the state the application is in for a given customers account, that can cause an exception.

Crash monitoring and reporting

As we know we cannot guarantee a zero crash-rate, the most important thing we can do is consistently monitor and report on any crashes we do see, in order to resolve them as quickly and effectively as possible. As you have seen throughout this post monitoring has been key to reducing the crash-rate, and it is key to keeping the crash rate low going forward.

Firebase Crashlytics will alert us any time a crash happens and provide us with information to debug and try to reproduce the crash, work out what caused it and implement a fix for it. Not only does it alert us to individual crashes, but it keeps a track of things such as the crash-free user rate we have discussed here, which is crucial to be aware of and be able to see if more crashes are creeping in.

Risk warning

As with all investing, your capital is at risk. The value of your portfolio with Nutmeg can go down as well as up and you may get back less than you invest.

Sources

[1] https://qualitrix.com/blog/app-crash-percentage/

[2]

https://developer.apple.com/documentation/coredata/nspersistentstorecoordinator

[3] https://developer.apple.com/library/archive/documentation/General/Conceptual/DevPedia-CocoaCore/MVC.html

[4] https://www.raywenderlich.com/34-design-patterns-by-tutorials-mvvm

[5] https://medium.com/@agoiabeladeyemi/optionals-in-swift-2b141f12f870

[6] https://github.com/realm/SwiftLint

Was this post helpful?
Let us know if you liked this post
Yes
No
Powered by Devhats
Briony Jones

Other posts by