Some of the more interesting Android fragmentation issues (to us anyway) have nothing to do with screen sizes or CPU speeds — instead, we usually find them rooted in something as arcane as provider- or hardware-specific implementations of what you could reasonably assume are “standard” parts of the stack. As your Android app (or in our case, our library) gets deployed to hundreds of millions of devices you’ll see more & more things start breaking. Things that you never expected could even break in the first place… Here’s a particularly “fun” one we encountered recently.
Realm supports AES256 encryption out of the box: add one method call when you create your Realm —
.encryptionKey(key) - and Realm will encrypt your data as you save it, and decrypt it as you read it. (We didn’t write our own crypto implementation, instead preferring to piggyback on the standard crypto libraries provided with Android.) Mysteriously we recently noticed that our encryption calls were failing 100% of the time for a subset of the end-users of one of our biggest apps. Specifically it only seemed to happen to users of the relatively recent LG G3 running KitKat, but not all of them… After a bit more digging, we found that only models delivered to a certain country were affected. What the hell?
After working with the affected app, we finally got our hands on a device, helping us reproduce an issue we otherwise could only see in the Crashlytics dashboard. Using the strace tool we were able to determine that the OS was attempting to load a file called
libXYZdrm_sf.so (where ‘XYZ’ are the initials of a major telecommunications provider), instead of the original crypto library that ships with Android OS. Upon further analysis, we found references to the telco’s name in the
libXYZdrm_sf.so file. As noted previously, this bug only surfaces on this telco’s specific model of the phone. We initially suspected that the telco did some manual customization that created a deadlock creating the issues our user were seeing when trying to use Realm. What we ended up finding out is that there are various messy incarnations of this crypto lib that were getting loaded based upon devices/versions/etc. We decided to resolve the issue by including the crypto library that we need in Realm directly. (Thankfully this library has a very small footprint, so the impact was not substantial.)
You could think this is a one-off, but through the many apps using Realm with 9-figure install counts, we’ve been lucky to observe weird conditions like this happen on almost every part of the stack, from CPU families to WebView implementations.
Get more development news like this
Let’s be real — we have entered an era where traditional Android components (OS, Kernel, Hardware, etc) are no longer as stable as they once were. They’re becoming fragile. As an app developer this elevates the need to treat everything in the stack with suspicion during such debugging cases. Telcos are also in this arena. It’s old news that Telcos modify the OS to suit their branding needs, but it is fairly rare for telcos to modify low level components such as cryptographic libraries. However, it is happening and developers need to keep a sharp eye out for every possible root cause nowadays.
What’s the worst bug you’ve found in an unexpected part of the stack? Leave a comment or follow the conversation on HN.
About the content
This content has been published here with the express permission of the author.