Evankimia resilientlyuploadinglogsonmobilecover

Resiliently Uploading Logs on Mobile

Introduction

I’m Evan Kimia, and I’m going to talk about Relay, which is a logger for CocoaLumberjack.

CocoaLumberjack is a mature logging framework; it’s been around for a long time. You can have different loggers, which do different things - e.g. logging straight to your console; logging directly to a file; logging into a Realm database and then repeatedly trying to upload it until it succeeds.

Zero

I work at Zero. Zero, we’re trying to start a bank. I came from Postmates, a medium sized start up. Before I was at a bigger company, at Zynga. When you’re at a big company, you get many things that you don’t necessarily realize how good you have it (e.g. a bunch of logging frameworks to use). Then you go to medium sized company and you’re missing these and your boss is telling to finish something as soon as you can. And spending time to make something to make your job more efficient, it slips through the cracks. You’d find a bug, try and figure out what’s going on with this bug, you’re spending hours because you don’t have logs. And you realize that you’ve wasted a lot of your time when you could have had a nice auditable log of actions to find that problem sooner.

Relay

Relay is a logger for CocoaLumberjack. It is designed for mobile. If you have a bad connection or no service, that log is not going to make it up to your server or to a third party logging service.

An Overview of Logging

The benefits of logging are tremendous.

Context around bugs

Let’s say you have an iOS crash - trying to figure out what’s up and the crash happens with a newer version of iOS. If you had a proper logging system in place, then you could easily sort it and segment it by the OS versions that people all running.

Let’s say if a customer reported a bug. “I’m trying to order this burrito. It’s not working, this things keeps crashing.” And it’s the only person that this happens to, that you can’t reproduce. Having a solid logging platform in place is being able to put in that person e-mail address and seeing everything that they’ve done not only in the application, on your front end application but on the back end as well.

Get more development news like this

Analyzing application performance

There is tools to analyze the performance of your applications on the back end and the front end - from a high level solution like mixed panels, it’s easy to use and it has certain things like measuring the performance of certain methods. Two other things like Labrato is good, or much any one of the big third party log aggregators offer nice dashboards that you can help analyze and digest your, your logs with.

User Insights

At the end on the day, you’re still looking at a log about something. It could be about a crash or some warning. But it could also be an event like the user just viewed this confirmation screen. It would help the product team figuring out how certain things are working out so you could use it for AB testing, you could do it for the product reviews, see how your product is doing. That has its own slew of tools, but you have to start at the basics.

Logging basics

Log levels

These are the standard five levels, from verbose to error.

You need to spend some time to figure out what constitutes an error. Error is probably one of the easier ones, warning, info, debug, etcetera. All these cool tools that exist for analyzing this data won’t work.

For example, for my app I’m doing all the good work before that happens. If there’s a bunch of warning logs that come up in a certain amount of time, let’s say 20 minutes, it’ll trigger an alarm. It’ll send a message directly in Slack for us to react to it. That’s extremely useful when you first launch your product. And it plays nicely now with iOS11. They’re finally letting developers have rolling updates. You could distribute your app to 1% of people and then expand that assuming things look good. We’re going to expand it to 1% and and hope Slack doesn’t trigger any alarms. This extends out into more serious situations.

For error logging, if you have a high amount of error logs that are different errors and those thresholds are low and the volume is high, that’s bad. You should be reactive to those situations. Pager Duty and in Ops Genie ping developers when something bad happens. There’s also good log aggregating tools that will fire that message directly to Ops Genie, Pager Duty, etc. It’s just a web poke. This means that not only is your life going to be easier if you think about logging beforehand, but you’ll also be more efficient at your job. And therefore hopefully less stressed out if anything were to break.

Hosting a log service vs. 3rd party solutions

Your data has to go somewhere. You can either host it yourself. Or you could use one of the third party logging solutions.

If you are hosting your own logging solution, you do get some fun benefits. For example, if I have my own first party logging platform and I get a log from the client, I can inject information that the client doesn’t have Because it’s going to my API, so that means I could inject parts of the network request that they might have been using or sensitive information that I don’t want the client to have but would be useful for debugging (which is important for me given that I’m trying to start a bank - I can’t have the client having too much information, that alone is a security risk. Giving someone too much information than what’s needed).

Back to Relay

I made Relay for a few reasons, one of them is reliability. I want to be able to record logs and have them get to my server wherever they need to go.

I didn’t find many good solutions that were light weight to use, and flexible enough to get that data to where I need it to go, which is a big problem with mobile logging. You’re a user using an app, you do a lunch bunch of random stuff. You’ve probably generated k50 logs, let’s say. And your service isn’t that great. You were playing a game in an elevator. Something bad happens. The developer is never going to be able to see that log if it’s just fire and forget. Assuming that your service is bad, and it probably is if you’re stuck in an elevator, let’s be real.

I wanted to leverage NSURL sessions, background session, so the application could pass that log to the system as soon as possible. I wanted the most reliable way to upload logs (which may sound silly), but given what I just said about efficiency and especially user analytics. If you’re missing something in user analytics, that could tell a potentially different story. I’m particular about getting all my logs in one place.

Another one is extensibility. Many frameworks provided by some of these big third party services, they’re just designed to go to their service. And they’re not flexible in terms of what additional information you could be including, or if it’s your first party logging platform, there could be a variety of different configurations you have that you need to account for. I wanted to make something that you could easily inherit and make your own configuration for uploading the logs. Or, for example, let’s say your off token changes, you’re want tell all those logs that are in the queue, hey, I have a different token. You’re going to need that in order to get to where you need to go. Sounds simple, but it’s something that I haven’t found a substitute for.

Logging in regulated environments

For financial services or for healthcare, there’s a bunch of laws around data security - e.g. making sure that the partners you use have something called a SOC 2 (a whole audit done by them saying that they’re safe, they’re using the best practices in terms of security).

Even big companies that we might think are big companies don’t have a SOC 2 - that limits my options. In addition, if I’m recording things, and I never would record this, but a credit card number, for example, you need to make sure that’s secure. We have a first party solution for logging. And therefore I needed a good solution for uploading logs to my first party solution.

In Relay, you have a log that’s called in your code. CocoaLumberjack at some point calls a flush event. And it tells all the loggers, including Relay, “time to push this out”. And for Relay it has a small Realm database that it records the logs directly into. That one file has a few huge benefits. For one, it encrypts it if you want it to. For me, even my debug logs are encrypted. And for two, it helps maintain the state of where all these logs are at at a given time.

After that log is added in the database, a temporary file is then created for that log entry. It’s associated with a background task that’s created and then it’s immediately passed to the system. The cool thing about background NSURL session tasks is that they persist for a while. Apple doesn’t have a number, but it’s roughly a week or so that it’ll just chill in the phone, just waiting for sufficient power and a good Internet signal to upload that log. It’s perfect for what I want to do.

Once it’s passed to the system and the system tells me this log has been successfully uploaded, then it’ll be removed from the database. If it fails (e.g. let’s say you’re on way down from the elevator this time, playing your game), it’s going to update that record in the database saying, this is still not done yet. We have to do something with it. I do have a parameter if you want for the number of retries for a log before it finally just gives up and says, you know what, sorry, can’t do it.

The way that works is essentially hooking Relay up to a few up delegate methods and then you are good to go. The system will wake up your app when need be. You don’t need any special permissions for this. It’ll just say, this task is finished, do what you will.

Uploading Logs

I was talking about extensibility and flexibility and getting these logs uploaded to your server, and I did this by abstracting the upload part. Instead of just having a let’s say a delegate method for saying, we’ve got this request. Do you want to modify the network request at all? That’s not necessarily the most flexible thing you can do.

Instead, I made a separate object which I just called the remote configuration. This specifies how and where these logs are getting uploaded to. You could specify an arbitrary endpoint, whatever you want. You could specify what indicates a successful upload. For us, that’s a 202 that’s been accepted. You have the flexibility of doing whatever you want.

The fun part about this is that if you ever change it, it’ll update all the logs that haven’t been uploaded yet, and that includes the ones past the system and update all of those requests. If it was already in the system, it’ll cancel, pack it again into another request with the appropriate headers, and then shoot it back up.

For us at Zero we have O Auth Two, etime the auth token changes, I have a delegate call back when that happens, I update my configuration with that new header value, and I’m done. The log’s going to get to where it needs to go.

Another fun thing about this remote configuration being its own object is that someone who’s using a third party solution can just make a subclass of it for some of these providers or whatever you’re using and make it available. I don’t know if we’re going to have that as a gist or a submodule but we would advertise that.

If you’re using know one of these four, for example, you just switch the class of the remote configuration file you’re using. And you’re good to go. It’s simple.

Setup

You set up your configuration. I want to hit this endpoint. I want Relay to know that if it gets a 202 back, then that means it’s okay. You instantiate your Relay logger. You could have multiple loggers. It’d be useful if you have a separate logger for your debug information, and then one for your user analytics.

Last, you have to hook it up for the background session. When it wakes up your app and tells you, your log has finished uploading or failed to upload, you should probably do something about it.

Log Away

Finally, you should add it to CocoaLumberjack. You just log away, do whatever you want. You could be certain that your data will get to where it needs to go.

Down the Line

Working on a project that hasn’t launched yet, I’m slowly working on Relay until it gets to 1.0 around the same time that Zero will launch. But help from the open source community never hurts.

I do want more a focus on user analytics. CocoaLumberjack is useful, mature. But it would be great leveraging this technology for reliably uploading logs to something more friendlier for user analytics. If you looked at other platforms for user analytics, they go by just events and properties. It’s simple. And there’s no reason why that same framework, that simplicity, can’t be added toward Relay with a different way of calling these logs.

Centralizing your data is a problem now that there’s startups that exist to solve it. For example, mparticle and Segment, you send all of your logs to them and then from there they’ll send it to whatever third party logging platform you wish to use, e.g. Mix Panel. Centralizing your data can help, which is why I want to push this more toward user analytics. It’s a huge pain trying to pull valuable data from each one of them to do your own analysis on, to run your own dashboards on.

Connect with us

If you guys would like to connect with us, drop me an e-mail ([email protected]). Also, I am looking for a director of iOS!

Next Up: New Features in Realm Obj-C & Swift

General link arrow white

About the content

This content has been published here with the express permission of the author.

Evan Kimia

Evan has 15+ years software engineering experience. Prior to joining Zero, Evan was a mobile engineer at Postmates, where managed the iOS Postmates mobile app, improved the architecture of the Android app, and helped to hire and scale their engineering and product teams from 10 to 80+. Prior to that, Evan worked as a senior software engineer at Zynga on a game with 4 million daily active users. Evan has a B.S. from the University of Connecticut.