Designing a Database: Realm Threading Deep Dive

by Realm Team

Apr 1 2016

So you’ve read the basics on Realm threading. You’ve learned that you don’t need to worry nor do much in order to harness the power of Realm when working with threads, but you’re still itching for more…

You want to know how Realm works under the hood. You want to know the theory, the how, the why. Well, you’ve come to the right place. We’ve got all the juicy details here. In this document we’ll explain how and why Realm is built the way it is and why it matters.

Let’s get right to it:

“Complexity is your enemy. Any fool can make something complicated. It is hard to make something simple.”

Sir Richard Branson

This quote is important because it showcases what we’re trying to evangelize. We’ve taken something very difficult and made it easy for developers to do - threading, concurrency, data consistency, and more, all of which are hard to get right. Our brains are not wired to model concurrency without a ton of trial & error, and even then we all end up making silly mistakes. The aim of Realm is to solve these issues for you.

The Underpinnings of Realm

Realm is a MVCC database built from the ground up in C++. MVCC simply means Multiversion Concurrency Control.

It’s not as complicated as it sounds, trust us. Hang in there and you’ll have the light bulb moment in just a second. 💡

MVCC solves an important concurrency problem: in all databases there comes a time when someone is reading from a database while someone else is writing to it (for example, different threads can be reading or writing to the same database). This creates an inconsistency in the data - maybe when you’re reading records while a write operation is only partially complete. If the database allows this, the result you get back will be inconsistent with what eventually ends up in the database.

This is bad.

At this point, your view of the data is different than what is in the database. That’s inconsistent and not reliable. Yikes.

You want your database to be ACID:

Atomic
Consistent
Isolated
Durable

There are multiple ways to solve this read/write problem and the most common is to slap a lock on the database. Given our previous situation, a lock would be applied when the write was in progress to the DB. This would in turn prevent the read operation(s) from continuing until the write completes. This is known as a read-write lock. This is usually very slow.

This is where Realm’s MVCC design decision comes into play.

Realm is a MVCC database

An MVCC database like Realm takes an alternative approach: each connected thread will see a snapshot of the data at a particular point in time.

What does that really mean though?

MVCC is the same design choice that powers the source control algorithms like Git. You can visualize the internals of Realm much like that of Git, with the concepts of branches and atomic commits. This means you can be working on many branches (database versions) without having a full copy of the data. Realm is a bit different than a true MVCC database though. With a true MVCC, like git, you can have multiple candidates for what will become the HEAD of the tree. With Realm, only a single writer can be operating at any time and will always work on the latest version - it cannot work on an earlier one.

Get more development news like this

Furthermore, Realm is similar to a gigantic tree data structure (a B‑tree to be exact), and at any point in time, you have the top level node, labelled R below (similar to Git’s HEAD commit).

Simplified Realm tree

As you make changes, copy-on-write behavior is taking place. Copy-on-write means that you’re forking the tree and writing without modifying the existing data.

Copy-on-write in Realm

By using this approach, if something goes wrong during the write transaction, the original data will still be intact, leaving the top level pointer still pointing to the non-corrupt data because you were writing elsewhere. A write is verified by Realm via a two‑phase commit concept where Realm verifies everything is written to disc and the data is safe. Only at that point will Realm move the pointer over and say, “Okay, this is the new official version.” This means that the worst thing that can happen during a write transaction is that you’ll only lose the data you’re updating, not the whole Realm database.

Realm Objects & Relationships

Another interesting fact about Realm is that object relationships are native references, and, because of the zero‑copy architecture that Realm uses, there is close-to-zero memory footprint. This is possible because each Realm object talks directly to the underlying database with a native long pointer as the hook to the data in the database.

Realm avoids most of the unnecessary slow bit shuffling and slow memory copying that needs to happen with traditional database access technologies.

Why does this matter?

The reason is simple. There is no need to perform additional work to obtain the referenced objects. The referenced objects are first‑class citizens. This helps with performance greatly: No need to perform additional querying or expensive joins.

Furthermore, all mobile devices are memory‑constrained. Keeping memory pressure low with Realm helps prevent out‑of‑memory situations and other memory pressure problems.

Zero‑copy explained, & why it is so fast

Realm is built upon a zero‑copy architecture. To understand the power of zero‑copy and why it is important, let’s quickly review how data is retrieved with traditional ORMs (object‑relational mappers).

Traditional object retrieval from an ORM/Core Data/etc

Most of the the time, you have data stored in a database file on a disk. A developer requests data in the form of an object that is native to the platform (e.g., Android, iOS) from the persistence mechanism (e.g., ORM, Core Data). At that time, the persistence mechanism will translate the request into a series of SQL statements, create a database connection (if it hasn’t already been created), send it to the disk, perform the query, read all the data from the rows that match the query, and then bring all of that into memory (that’s the memory allocation). At that point you have to deserialize the data format to a format that can be stored in memory which means aligning the bits so the CPU can deal with them. Finally, the data needs to be transformed into the language‑level type, and then it is returned to the requester via a object that the platform can interact with (POJO, NSManagedObject, etc). This is even more complicated if you have child or list references in a persistence mechanism. At that point, this cycle can happen over and over (depends on the persistence mechanism and the configuration). If you’re using a homegrown solution, the situation remains mostly the same.

As you can tell, there’s a lot going on in order get the data into a data structure that you can use in your application.

Realm object retrieval

Realm’s approach is different. This is where our decision to use a zero‑copy architecture comes into play.

Realm skips the entire copy process because the database file is always memory‑mapped. Realm accesses any offset in the file as if it were already in memory even though it’s not - it’s virtual memory. This is an important design decision for the core Realm file format. It allows the file to be readable in memory without having to do any deserialization. Realm skips all those expensive steps that traditional persistence mechanisms have to go through. Realm simply calculates offsets to find data in the file and returns that value from the raw accessors on the data structure (POJO/NSManagedObject/etc). This is much more efficient and therefore much faster.

Relationships in Realm are also extremely fast because they’re indexes that traverse a B‑tree–like structure to the related object. This is much faster than querying. Because of this there is no need to perform another full query as ORMs do. It simply is a native pointer to the related object. That’s all there is to be done.

Auto-Updating Objects & Queries

The zero‑copy architecture provides more than just speed. Realm objects and Realm query results are live, auto‑updating views into the underlying data, which means results never have to be re‑fetched. Modifying objects that affect the query will be reflected in the results immediately.

Assume the following code:

Java

RealmResults<Dog> puppies = realm.where(Dog.class).lessThan("age", 2).findAll();
puppies.size(); // => 0

realm.beginTransaction();
Dog dog = realm.createObject(Dog.class);
dog.setAge(1);
realm.commitTransaction();

puppies.size(); // => 1

// Change the dog from another query
realm.beginTransaction();
Dog theDog = realm.where(Dog.class).equals("age", 1).findFirst();
theDog.setAge(3);
realm.commitTransaction();

// Original dog is auto-updated
dog.getAge();   // => 3
puppies.size(); // => 0

Swift

let puppies = realm.objects(Dog).filter("age < 2")
puppies.count // => 0 because no dogs have been added to the Realm yet

let myDog = Dog()
myDog.name = "Rex"
myDog.age = 1

try! realm.write {
realm.add(myDog)
}

puppies.count // => 1 updated in real-time

// Access the Dog in a separate query
let puppy = realm.objects(Dog).filter("age == 1").first
try! realm.write {
puppy.age = 3
}

// Original Dog object is auto-updated
myDog.age // => 3
puppies.count // => 0

As soon as the Dog object is created and committed to Realm, the puppies query result is automatically updated with the new values. If we change the dog via another query, the original dog instance will automatically be updated as well.

The same auto-updating feature is at play when other threads update Realm’s data. When objects are updated in the other threads, the thread‑local objects will be updated in near–real time (meaning that if they’re currently in the process of doing something, the update will happen in the next iteration of the runloop). Furthermore, you can also force a data update via the Realm#refresh() operation.

This applies to all Realm objects & query result instances.

On the next iteration of the runloop (or when Realm’s refresh method is called), the Realm instance will be operating off of the most recent top-pointer available (most recent version of the Realm data).

This property of Realm objects and Realm query results not only keeps Realm fast and efficient, but it allows your code to be simpler and more reactive. For example, if your UI relies on the results of a query, you can store the Realm object or Realm query result in a field and access it without having to make sure to refresh its data prior to each access.

You can subscribe to Realm notifications to know when Realm data is updated, indicating when your app’s UI should be refreshed, without having to re-fetch your Realm query results. This feature is available in most Realm products, including Java, Objective‑C, & Swift, & is planned for React Native.

Getting Notified When Realm Data Changes

Automatically updated objects is an awesome feature, but it’s not that useful unless you know that it happened so you can respond to it. Thankfully Realm ships with a notification mechanism that allows you to react to changes in the Realm data.

Assume the following situation: There is an object that is used on the UI thread to display some UI values. A background thread makes a change to this object through some operation. Almost immediately (upon the next iteration of the runloop) the data in the UI thread object is updated (remember the object works directly off the core database, due to the zero‑copy architecture).

The background thread sends a notification message informing the UI thread that a change has been made via a Realm change listener. (This feature is available in most Realm products, including Java, Objective‑C, & Swift, & is planned for React Native.) At this point the UI thread can then refresh its view to display the new data, as in the diagram below.

Realm Notifications

A Threading Model to Ensure Safety

Often the following question surfaces:

“Why can’t Realm objects be passed across thread boundaries?”

It’s a valid question, and one whose answer is rooted in isolation and data consistency.

Since Realm is based upon a zero‑copy architecture, all objects are live and auto‑updating. If Realm allowed objects to be passed across threads, Realm would not be able to ensure data consistency because various threads could be attempting to change an object’s data at undetermined points in time. The data could become inconsistent very quickly. One thread may need to write to a value while another one is reading from it, and vice versa. This becomes problematic very quickly and you can no longer trust which thread has the correct object data.

Yes, this can be solved a number of ways and it is usually accomplished by applying locks over the objects, mutators, and accessors. While this does work, the locks become a painful performance bottleneck. The problem with locks, besides performance, is, obviously, that they lock - a long write transaction on a background thread will block a read transaction on the UI. If we were to utilize locks, we would lose the tremendous speed benefit and data consistency guarantee that Realm offers.

Therefore, the only limitation with Realm is that you cannot pass Realm objects between threads. If you need the same data on another thread, you just query for that data on the other thread. Or, even better, observe the changes using Realm’s reactive architecture! Remember - all objects are kept up‑to‑date between threads - Realm will notify you when the data changes. You just react to those changes. 👍

Realm and Android

Since Realm is thread‑confined, you need to understand how to work with Realm in various Android Framework threading environments.

Realm and Android Framework threads

When working with Realm and Android background threads, you need to be sure that you’re opening and closing the Realm in those threads. Please see the Working with Android section in the documentation for how to best do that.

While Realm will work on Android’s main thread, we highly recommend that you use the Async API to ensure you do not encounter an ANR (Application Not Responding) error.

What if I want to run Realm on the main thread? Will it work?

Some operations are perfectly fine to run on the main thread, and some are not. The question now becomes…

When is it not OK to run Realm on the main thread?

This is where things get a bit murky. It depends on a number of factors. To avoid having to figure out when and if an operation is OK to run on the main thread, you should follow a common rule of thumb when working with Realm and Android’s main thread:

Use Realm’s Asynchronous API.

Using Realm’s Asynchronous API for queries and transactions

Recently, support for asynchronous queries and transactions was released. Both of these constructs allow you to easily compose asynchronous operations with a very easy‑to‑follow pattern.

Asynchronous queries with Realm

Asynchronous queries are quite easy to understand now that you’ve been exposed to the internal threading model.

Realms query methods can be suffixed with Async()( e.g., findAllAsync()), and they will return immediately with RealmResults or a RealmObject. These methods are promises (very similar to the concept of a Java Future) and will execute the query in a background thread. Once the query completes in the background thread, the returned object will be updated with the results of the query.

Below is an example asynchronous query:

private Dog firstDog;

private RealmChangeListener dogListener = new RealmChangeListener() {
    @Override
    public void onChange() {
        // called once the query complete and on every update
        // you can use the result now!
        Log.d("Realm", "Woohoo! Found the dog or it got updated!");
    }
};

Elsewhere in your application you’d have the code below (onCreate in this example).

firstDog = realm.where(Dog.class).equalTo("age", 1).findFirstAsync();
firstDog.addChangeListener(dogListener);

You’d also have this code in the onPause() method:

firstDog.removeChangeListener(dogListener);

In onCreate(), the findFirstAsync() method is called.

Calling findFirstAsync() will return the first RealmObject in the RealmResults. This method will return immediately; firstDog will not be hydrated with data until the query completes. To be notified when an async operation is completed, I’ve added a change listener to the firstDog RealmObject. This listener is no different than other listeners - I need to keep a reference to it so I can remove it later, which I’m doing in the onPause() method.

If at any time you want to check to see if your RealmObject or RealmResults has been loaded, you can use the isLoaded() method, e.g., firstDog.isLoaded().

Non-Android Looper threads caveat: The async query needs to use the Realm’s Handler in order to deliver results consistently. Trying to call an asynchronous query using a Realm opened inside a thread without a Looper will throw an IllegalStateException.

Writing data to Realm with asynchronous transactions

Writing to Realm asynchronously is a breeze with the new asynchronous transaction support. The asynchronous transaction support works the same way as the current executeTransaction, but instead of opening a Realm on the same thread, it will give you a background Realm opened on a different thread. You can also register a callback if you wish to be notified when the transaction completes or fails.

Implementing an asynchronous transaction is simple:

realm.executeTransactionAsync(new Realm.Transaction() {
    @Override
    public void execute(Realm realm) {
        Dog dog = realm.where(Dog.class).equalTo("age", 1).findFirst();
        dog.setName("Fido");
    }
}, new Realm.Transaction.OnSuccess() {
    @Override
    public void onSuccess() {
        Log.d("REALM", "All done updating.");
        Log.d("BG", t.getName());
    }
}, new Realm.Transaction.OnError() {
    @Override
    public void onError(Throwable error) {
        // transaction is automatically rolled-back, do any cleanup here
    }
});

The last two parameters in the code above, new Realm.Transaction.OnSuccess and new Realm.Transaction.OnError are optional. The callbacks are provided here so that a developer can be notified when the transaction completes or errors . The executeTransactionAsync method accepts a Realm.Transaction object that gets executed on a background thread.

Override the execute method and perform your transactional work in this method - this is the code that gets executed in the background thread. The execute method provides a Realm to work with. This Realm instance was created by the executeTranscationAsync method and is the Realm from the background thread. Simply use this Realm instance to find or update the item(s) of interest (in this case a dog), and that’s it! The executeTransactionAsync method calls beginTransaction and commitTransaction for you in the background, so you do not have to.

Once the operation is complete the onSuccess method of the Transaction.OnSuccess class will be executed. If an error occurs, the exception will be delivered to the onError method for you to handle. It is important to note that if the onError method is invoked, the Realm transaction will have been rolled back due to the error.

Lastly, Realm holds a strong reference to the Transaction. If you need to cancel the transaction for any reason (stopping of an Activity and/or Fragment), simply assign the result to a Transaction instance and cancel it elsewhere.

RealmAsyncTask transaction = realm.executeTransactionAsync(...);

...

// cancel this transaction. eg - in onStop(), etc
if (transaction != null) {
    transaction.cancel();
}

Realm’s asynchronous transaction support also requires Realm’s Handler to deliver the callback (if you’re using one). If you start an asynchronous write from a Realm opened from a non-Looper thread, you won’t get the notification.

* There are exceptions to everything. You can benchmark your code with System.nanotime and get a feeling for the speed if you need to. Please note, benchmarking Java is a very complicated topic, as the VM can hop in at random times and garbage collect, so please use System.nanotime as a basic gauge.

Low Friction. Low Overhead. Lots of Advantages.

Realm was developed to help you in your day‑to‑day environment.

We developed Realm to make development around your persistence layer easier and more enjoyable. We sought to solve some of the very hard problems with data consistency, performance, and threading. The only thing you have to be careful about is to make sure you don’t pass objects across threads. That’s it!

If that’s all you have to do to ensure consistency and still gain the performance benefits and reactive programming features of Realm…well…that’s not a bad tradeoff at all. ☺️

Next Up: Understanding Realm #4: Best Practices & Pain Points in Mobile Networking: REST API Failure Situations

About the content

This content has been published here with the express permission of the author.

Realm Team

At Realm, our mission is to help developers build better apps faster. We provide a unique set of tools and platform technologies designed to make it easy for developers to build apps with sophisticated, powerful features — things like realtime collaboration, augmented reality, live data synchronization, offline experiences, messaging, and more.

Everything we build is developed with an eye toward enabling developers for what we believe the mobile internet evolves into — an open network of billions of users and trillions of devices, and realtime interactivity across them all.

4 design patterns for a RESTless mobile integration »