Altconf ben sandofsky

Periscope's Sketch Feature: Prototype to Production Code

Periscope’s new “Sketch” feature lets you draw on live video. This AltConf talk takes you behind the scenes, from prototype to production code, with all the bugs along the way. If you’ve never worked in low-level graphics, this talk is for you.

Introduction (0:00)

A reason a lot of you are probably coming to this conference or WWDC is that you want to learn all about new things, new frameworks and new processes. It’s really important to be challenging yourself all the time, but you end up reaching a plateau, and then you can fall into a trap where you’re learning lateral things. Where you’re not really advancing your skill set, you’re just learning new things for the sake of doing things differently. So when I’ve run into that, what’s nice is to take a step back and ask okay, okay, let’s take a broader look at the scope of what I’m learning.

Now one way of doing it, I think everyone here calls themselves an iOS developer, you could say, “Well, maybe it’s time to learn Android, see if I could pull at any techniques that they’re doing on their platform.” But then again, Android and iOS are more similar than they are different nowadays. So maybe you could take a step back and look higher at more abstract things like design, and learn better ways to communicate with your designer or anticipate feature requests before they come up.

Or if you want to develop a specialization, you could go deeper into the stack. Learn something lower level, like the theory behind security. At the end of last year, I was asking myself, what if I learned more about OpenGL so that what core animation is doing is less mystical to me? So I was learning about OpenGL by building some toy apps, but I hadn’t shipped anything into production. By coincidence, a few of my peers I’d worked with a while ago, had gone on to work at Periscope and they reached out to me. And they were looking to develop some new ideas and features.

Now they’re really busy over there developing important priority features like the recently launched content moderation or comment moderation system which is crucial for building a safe environment for broadcasters. And as a growing social network, you need to prioritize things like that but, it’d be nice to have some time to sort of make bets on new things, right? And so they asked me if I’d be interested in prototyping some new stuff and then bring it to production if it seems good.

Today we’re going to take a feature from pitch to production. And hopefully, I’ll explain some of the dead ends and bugs I ran into along the way. If you aren’t interested in any of the product stuff, it is my simple goal today to get you more excited about leveraging the GPU to solve certain classes of problems, and have a better understanding of what iOS is doing on the lower levels of the inside graphics systems.

The way that this new feature was pitched was “We think we wanna draw on video.” And I’m like, “That sounds pretty cool.” But they hadn’t really answered a few questions about the feature; it was still a clouded idea.

The Prototype (3:27)

The first thing you do rather than invest potentially weeks or months developing something is throw together a prototype to answer the most important questions before you build the real system. The questions were:

  1. Do we even want this feature? Okay, is this going to be something that broadcasters will find value in or is it going to be a gimmick? You’re going to use it once and then you have to support this feature now until the end of time. And there’s going to be a button in the UI that no one’s going to care about. Is it worthwhile to build?
  2. If you do build it, should the drawings be baked into the video or do you provide them as a separate layer on the video and a separate stream that you could toggle on and off?

Now I am not a professional “prototyper”. There’re some great WWDC videos that talk about Apple’s prototyping process like Developing for Future Hardware.

The basic idea is, you ask what’s the minimum you need to implement to answer these questions. It might just be a concept video, or it could be a paper prototype that isn’t really coded. But what is the minimum we actually need here to know whether it’s a worthwhile feature? Well, we do need to build code. We want to see what the broadcasting experience could be and make sure it’s not super complicated; it’s delightful. And we probably need viewers to see if there’s anyone who’s getting value out of it. That doesn’t mean that we need to roll this out to like 1% of actual production users, we could just build a hacky version, give it to a dozen employees and try it out and immediately we’ll know if there’s value.

The Current Architecture (4:58)

Alright, so do we build a separate app that’s just a sketching app, is that fast? Or could we hack something janky into the main app on a throwaway branch? So, the existing architecture of Periscope is pretty straight forward. There are just two streams. One of them is the live video, that’s 320 by 568, and there’s also a web socket connection that’s getting JSON objects in to represent the hearts, the comments, or other important data that’s going to be affecting the video. And because they will not be arriving at the same time, JSON will usually arrive a little early; there’s NTP code to synchronize when these events should take place on screen.

What’s really interesting is that, on staging servers, it’s open so that you could send arbitrary JSON payloads for prototyping stuff like this. So I could create my own JSON object, push it up and on beta clients that are pointing to the staging servers use it on my hacky branch.

It took less than three days to hack something together, and this is what we ended up with. And this isn’t even using OpenGL, this is using CAShapeLayer. I have a really silly naive model that represents the drawing where there’s a drawing object and stroke objects that belong to the drawing. And once every second, I just take a snapshot of this drawing model, serialize it to JSON and send it upstream. And with CAShapeLayer, you just assign a CGPath to each ShapeLayer, so there’s a separate ShapeLayer for each stroke, and you can set the stroke color. And most importantly, strokeStart and strokeEnd are animatable in core animations, so you don’t have to worry about writing your own draw loop or getting too deep into there. And we immediately received the feedback that we needed.

Get more development news like this

So the first question was, should we build it? Yes, this is a lot of value, it’s great, let’s go ahead and do it. And as we were using it, we were able to think a bit more about the second question, whether or not the drawing should be part of the video.

Thoughts on Drawings (6:57)

What we concluded was that drawings are essential to the message. So what happens if, at the end of your broadcast, you tap the save to camera roll button, and there aren’t drawings there? It doesn’t make as much sense. Also, how do you handle making sure there’s maximum distribution of the video on other platforms like web or Android? We could write a renderer for each platform, but then what happens if they want to broadcast on television?

There’s a lot of material challenges to shipping renderers to each platform. And how do you do stuff like iterate on a platform’s drawing schema? Could it break drawings that were developed a year ago? While it seems as though it’s technically superior to have a separate stream that has all the drawings, it felt like this is just a more challenging problem. It doesn’t actually deliver any value for users and actually makes it more complicated when you start dealing with like long term archival of the video.

So we decided, let’s bake it into the video. It seems like it’s the best experience. Okay, as the engineer, I still had a lingering doubt about that model I developed, which was similar to a vector drawing model where you have a drawing, and a bunch of sub-objects. The animation code that I wrote was pretty janky and didn’t feel like it fit with the experience we were trying to go for and that proof of concept. But, I didn’t let that be a blocker for switching gears. Let’s focus now on rendering.

Real Code (8:24)

Now we’re going to get into the real code. We’re really pumped, and it’s time to dive into lower-level drawing. For that, we’re going to use the GPUImage library. A lot of people have used this library, and it is awesome. And for 99% of apps, this is exactly what you need to write real-time processing of photos or videos. And the way that it’s right now used in Periscope to power the video processing stack is, it converts different video formats into something consistent that we can work with and upload to the broadcasting stream.

For instance, we accept video from DJI drones. I’m not going to get into the particulars of the video format, but basically, the video comes in in what’s known as YUV. DJI uses a slightly different version of YUV video that’s known as three plane instead of two plane. The long version short of it is we need to write a custom filter that converts it into a consistent format to work with. Also, we’re using it to downscale the 320x568 before we upload it to the endpoint.

The way that it works right now is there’s a video processor that receives frames from AVFoundation. Each frame is fed through the process; it does those operations, and then is sent up to the network stack. So all that we need to do is on the UI level create a touch interpreter model that takes in the touch points that you’re moving on the screen, as well as if the camera is landscape or portrait, and converts to the appropriate coordinates in the video. Then I feed that into a model object that exists within the GPU image stack as a separate filter that does all the appropriate rendering of the sketches we’ll get into.

Why People Struggle with OpenGL (10:51)

For those of you who are just diving into OpenGL, set as an ambitious goal spending one day getting a red dot on screen. Okay, there’s a lot of background and theory and understanding the existing rendering system before you can get to this moment. It will frighten your PM when you tell them “I spent a day putting a red dot on the screen,” but you’re making great progress. In fact, I’m going to go into pretty much most of the theory you need to do something like this in about ten minutes. In fact, I’m not going to show more than one line of OpenGL.

  1. There are two elements of OpenGL: there’s the API, and the underlying theory. And the bad news is, OpenGL has a really bad API. It came out in 1991. I’m sure many people in this room were born after 1991, and it doesn’t even use an object-oriented API. It uses this wacky state machine, not the nice state machine like with core graphics. It’s this weird state machine based on finding, and newer versions of OpenGL are moving away from that to something that’s a bit more intuitive. What’s more important is to understand the theory behind it because in many ways, Metal is almost a better beginning graphics platform because it doesn’t use this crazy metaphor.
  2. Multi-threading, which we’ll get into later, gets pretty complicated when you want to do rendering off the main thread. It’s very error-prone because of thread based state.
  3. There’s a ton of legacy support from this old API, and you’ll often see in code snippets that are doing things the old way and no longer recommend and less performant. And I liken it to jumping into Game of Thrones on season four like “Wait, who’s, what, no, oh, who’s that, okay.”

GPU Parallelism (11:55)

Now, to understand the GPU. There’s a misconception that GPUs are really really fast. Some might say, move it to the GPU and it’ll be lighting fast. The way to think of the GPU is that it’s great at parallelism. So in fact, your CPU on an iPhone 6 is at 1.4 gigahertz whereas each core on the GPU is 450 megahertz. However, your GPU usually has many more cores, and I couldn’t get the exact number to put it on the slide, but the SIMD width of each of those cores is wider. Which basically means within each instruction, it can operate on more data items. So it’s really good at processing vectors. It’s possibly an order of magnitude more that can be processed in parallel. If you can find a highly parallelizable algorithm, that’s a pretty strong sign that it can sit on the GPU, and you’ll see in the case of deep learning neural nets an order of magnitude improvement.

The big trade-off that you should understand is, sending anything to the GPU to operate, the round-trip is expensive. Copying over the memory can easily become a bottleneck, so you just need to be aware of that as we get into later. For instance, if you’re a game developer this is why you’ll often load all of your resources before the game level begins, because if you tried to upload a new texture map or a complex model within 17 milliseconds you’ll probably drop frames. The other thing to understand is that it isn’t also necessarily optimized for drawing. So, Apple experimented with moving Quartz over to OpenGL rendering on OSX, and they called it QuartzGL.

Not aways optimal in rendering (13:23)

Matt Gallagher at CocoaWithLove he turned on these features through debug mode in OSX, and then he ran some benchmarks. The reason that Apple never followed through and made this full-fledged rendering system on OSX is that for some operations it was faster, but for stuff like rendering lines where as you start approaching something you might use Adobe Illustrator for, it’s actually a lot less performant. So, Quartz 2D in CoreGraphics still has a place, and it’s really important to benchmark it for the type of operations that you want. And don’t assume going to OpenGL is going to be faster.

What we’re going to do is pretend that we’re going to be building a 3D game to start out because ultimately, all of the GPU is designed to do is render 3D games. We’re kind of abusing some of the things that it’s really optimized for, for the sake of processing images. And I’ll explain how GPUImage does this in a few minutes.

Shaders (14:11)

The first bit of terminology you’ll hear used is “shaders.” Shaders are tiny programs that you write in a custom language called GLSL, GL Shading Language. They’re uploaded to the GPU to operate massively in parallel on rendering. I’ll dive in right now and explain what exactly they do.

Let’s say that you’re building a 3D game, and you want to render this:

Stanford Bunny

The Stanford bunny is made up of 30,000 vertices which are coordinates that make up the mesh which makes up the object. Also, you may think of a vertex like in geometry class; they’re just coordinates. But in OpenGL, you can also attach additional metadata to each point like the color of this vertex, or other data that you want to feed to the rendering pipeline later. So a vertex is usually the coordinates and whatever other data above that one particular point in space that you want to feed into the rendering system.

Now, OpenGL uses a bunch of math that we don’t need to get into where it takes these coordinates in this arbitrary 3D space, and does a lot of transforms that involves matrices to flatten them out into an image on screen. It isn’t actually rendered yet; this is just for illustrative purposes. But it takes these arbitrary points and projects them onto a flat plane. And it does it in a -1 to +1 coordinate space because OpenGL is pretty much resolution independent. So the viewport is always negative one to positive one. If you wanted to, you could just arbitrarily feed points that are negative one to positive one and draw them within the viewport. But in the case of a 3D game, that would be kind of crazy. Your vertex shader that you would write would do this translation for you.

Now things get interesting because the fragment shader runs on the surface. So for every pixel that makes up your object, your fragment shaders run and will output a color value. And this is where you copy and paste different rendering algorithms, in this case, a Lambert algorithm to give it a nice matte surface or you calculated the lights that make up the scene. Often, you’re not inventing the wheel; you’ll just be copying and pasting particular effects that you know.

What’s really interesting about this stage, is that in most games today you have texture maps that wrap around your models. And GPUs have dedicated hardware to make it really fast to pull out these pixel values and do stuff like real-time filtering to smooth them out. As you zoom in or zoom out of a texture, it’s nice and smooth through bilinear filtering that’s all hardware accelerated. Operating on texture maps is really fast. And incidentally, a texture is just a fancy word for a bitmap that’s been uploaded to the GPU.

OpenGL Flow (17:16)

So here is the overall flow when you’re working with OpenGL:

  1. You point OpenGL to a bunch of vertices, or you upload them to the GPU or point them to ones that are already sitting in a buffer on the GPU.
  2. I want to render this set of vertices, using this vertex shader and this fragment shader, and then you can fiddle some other switches like I want it to blend into the existing drawing this particular way.
  3. Then you issue a draw call.

In this case, we’re using glDrawArrays to draw the array of vertices, and use the GL_TRIANGLE_STRIP way of representing these vertices as triangles. So here, it’s going to treat each vertice to assemble a strip of triangles that will run through the pipeline.

What’s interesting to know is that OpenGL is asynchronous. After you issue that draw call, it does not block and here’s where OpenGL is a slightly leaky abstraction. Behind the scenes, it’s pushing all of these commands into a command queue that’s being fed to the GPU. Although, it’s not going to execute immediately. By not immediately it may be a millisecond later. So, unless you do something unintentionally where you decide to read data back, it won’t block.

Just to wrap up, if you were building a 3D game what you would then do in your game is you would iterate over each model that makes up your scene and ask to render it. Now for today, you can just think of it as a painter’s model where you’d render from the most distant object to the closest one. Now in the real world, that’s not very practical so OpenGL’s facilities handle stuff like, if you notice in this example from Wikipedia, the trees are blocking the background so aren’t you wasting a lot of time rendering stuff that’s ultimately blocked? Well, we won’t get into it, but there are ways to optimize around that. But for today, just think of it as a painter’s model where you render the post further furthest objects to the closest.

Cool, so knowing all that, what does GPUImage actually do? It’s amazingly simple. It spits out four vertices that cover the viewport which would be your screen or your render destination. It just covers them, negative one to positive one. It feeds in a texture map to shade the quad that you’ve just created, in this case, a frame of video that was fed to it through AVFoundation. So all it’s doing is rendering a quad on screen and using fragment shaders to unlock the power of the GPU.

I’m not going to teach you shading language today, but I’ll show you kind of the hello world of shaders where at the top are the attributes that are fed into. This is the vertex shader:

attribute vec4 position;
attribute vec4 inputTextureCoordinate;

varying vec2 textureCoordinate;

void main()
    gl_Position = position;
    textureCoordinate = inputTextureCoordinate.xy

All it’s doing is saying, I don’t want to modify the vertex that’s being fed to me. I don’t want to project it on a different viewport; I just want to pass it through. Where it’s handed a position vertex and handed over, it assigns it to a special variable the GL position variable. And it’s also handed the texture coordinates for how that texture map should map to that quad. And it just passes it through saying. Let’s move on to the fragment shader.

And here’s the “Hello, world” of fragment shaders:

varying highp vec2 textureCoordinate;
uniform sample2D inputImageTexture;

void main()
    gl_FragColor = texture2d(inputImageTexture, textureCoordinate);

And this is going to be run on every pixel that makes up the surface. Where all it does is it takes in the coordinate for that quad and asks for the pixel that exists in the texture map. It pulls out and spits it out on screen.

Knowing all that, all we should have to do to render a sketch on screen is add another draw call to render the drawing, right? So, all we would, in theory, have to do is take each of those touch points and create vertices to draw a polygon that looks like a stroke, and then we’d write a fragment shader to make that polygon look more like a stroke than a big jagged square. And in reality, you’d probably have to do stuff like tessellation to smooth out the edges. But then as you’re reading through OpenGL, you notice that we passed it this GL_TRIANGLE_STRIP:

glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);

And there’re different ways you can ask OpenGL to represent the vertices that you’re feeding it.

It turns out, there’s another option called GL_POINTS in which case OpenGL, on iOS you can represent each point as a dot on screen and most importantly, you can represent each dot with a texture map:

glDrawArrays(GL_POINTS, 0, length);

And then the gears in your head start turning. You’re like wait a minute, so I could represent something like Photoshop where effectively you’re repeating the same stamp of a brush across screen. That’s actually pretty cool because it unlocks particular effects and in games, you might use this for particle effects. But this seems like it’s much better than trying to do a bunch of tessellation and rendering strokes and it limits you if you just treat this as a polygon right?

So let’s explore that.

All we’d have to do is take these different points that were rendering and of course, there’d be gaps in-between, but what if we just interpolate between each of these points and it starts to approach a nice shape with feathering at the edges because it’s just the same stamped image like in Photoshop. And you might think you’re building it until you find Apple’s sample code two days later that does just this. And you wish that you’d save 80% of your time by copying and pasting their code. But no, no, no, it’s just validating your brilliance that someone else already wrote this for you.

So, alright I’m feeling good about the rendering that seems pretty cool. You can do some cool effects with that. Going back to that model, I wasn’t really satisfied with it. It felt like the original model we were building was optimized if you were building Adobe Illustrator. And so you know, you meditate a bit on what are we really doing here? And in the concept video, you draw and screen after a while, the stroke kind of dies away after a certain number of frames. You think well, we’ve got a simulation with real time graphics driven by user input. That sounds like a different field of software development.


In game development, you use the game loop, and the way that that would work is you just have a timer that runs every one-thirtieth of a second. And during each tick of this game loop, it will process button presses. It will update the game state like soldiers marching. You know they move one particular step, and then it does a render call. And it’s just a simple loop that runs one once every thirtieth of a second. But we already have a loop coming in through each of the frames that are being processed through AVFoundation. So we could just use the frames as they’re coming in to drive the simple game loop but for sketching.

The Final Design (24:50)

The final design that actually felt really awesome was we treated each stamp like a particle in a particle simulation, where it’s born at age zero. For every tick of the game loop, the draw loop we just tick the age by one. And we set a maximum life of, I think we started out at 110 frames. When it reaches 110 frames, you remove the particle from the vertex buffer, and it will no longer be rendered.

This is the final struct that we used to represent each vertex:

typedef struct {
    GLfloat x;
    GLfloat y;
    GLfloat radius;
    GLfloat age;
    GLfloat dissolveAngle;
    GLfloat red;
    GLfloat green;
    GLfloat blue;
    GLfloat alpha;
} SketchPoint;

Only two of those properties are actual x and y; the rest is metadata that we’re going to be feeding into the vertex shader. We’ll talk about the dissolveAngle one a little later.

So each vertex knows its age. We decided to feed that into the system. What if you used the age of each point to drive particular effects? So inside your shader, you could say: for the first 10% of the life of this particle I want to fade in. Or I want to mix between a white color and my final color, and I want to make myself a little larger for the first 10% and then for the last 10% of my life, I want to use that value to drive the opacity. So here, now we’re starting to get a lot more fun.

Here’s a snippet from the vertex shader, and notice dissolveAngle:

float life(aAge/uMaxAge);
lowp float outroValue = smoothstep(0.75, 1.0, life);
gl_Position.x += (cos(aDissolveAngle) * outroValue * 0.01);
gl_Position.y += (sin(aDissolveAngle) * outroValue * 0.01);

When we created each particle, I chose a random point in 360 degrees that at the end of the life of the particle, I wanted to in the last 10%, sort of shoot off in a random direction and shrink. And here we’re using the smooth step function that’s part of GLSL which lets you plant between these two numbers, .75 and one, and as the value drifts between those two numbers, it returns a value between zero and one based on how far you are between those stepping between those two paths.

Basically outroValue toward the end of the life of the particle will ramp up from zero to one. And I use that to drive where I want the point to drift off in a different direction. That’s a lot more fun.

I’m not going to go line by line to break down this vertex shader, but it’s less than 50 lines of code:

uniform float uMaxAge;
uniform mat4 uTransform;
uniform float uPointScale;

attribute vec4 position;
attribute float radius;
attribute vec4 aColor;
attribute float aAge;
attribute vec4 inputTextureCoordinate;
attribute float aDissolveAngle;

varying lowp vec4 vColor;
varying lowp float vlife;
varying vec2 textureCoordinate;

void main()
    gl_Position = position;
    gl_PointSize = radius * uPointScale;
    vColor = aColor;
    float life = (aAge / uMaxAge);
    vLife = life;
    lowp float introValue = (1.0 - smoothstep(0.0, 0.05, life));
    lowp float outroValue = smoothstep(0.75, 1.0, life);
    lowp float flairUpValue = smoothstep(0.7, 0.8, life);
    lowp float shrinkValue = 1.0 - smoothstep(0.7, 0.95, life);
    gl_Position.x += (cos(aDissolveAngle) * outroValue * 0.01);
    gl_Position.y += (sin(aDissolveAngle) * outroValue * 0.01);
    gl_Position*= uTransform;
    vColor.rgb = mix(vColor.rgb, vec3(1.0, 1.0, 1.0), introValue * 0.6);
    vColor.rgb = mix(vColor.rgb, vec3(1.0, 1.0, 1.0), flairUpValue * 0.6);
    gl_PointSize *= (introValue + 1.0) * shrinkValue;

That gets run on every vertex. And actually, the fragment shader is just this where we pull out the texture map for that brush, and we spit that out as the particle we’re rendering on screen.

Of course, at the last minute, you get new requests from product and so what if, because we want to keep the product limited, we don’t want to give the user a full RGB color wheel to choose color, we want to limit it to say three colors like in the prototype. But what if we let them have a color picker to pick something on screen and use that as a source color? That could be pretty fun and let users be creative.

You’re probably wondering how do you do that in OpenGL? I don’t know, and I don’t care, I use Core Image. It isn’t important to rendering, so I just drop that in and it’ll let me implement it in like an hour and a half.

Real World Issues(29:25)

This is the near-final version of the feature. Now we get to the fun part, the real world as we’re merging the code in and trying it on different devices and real world issues you run into.

Video Stabilization latency (29:31)

A few months ago they turned on video stabilization. And if you haven’t used this API on newer iPhones, it’ll stabilize video, but there’s an additional latency between when you record video and when AVFoundation returns the frames from the video because it’s doing magic stabilization. So in fact, for Periscope, what it shows on screen is just an AV video preview layer before it’s sent out upstream. It’s not 100% what is getting broadcast; it’s the pre-stabilized video. The reason for doing that it’s kind of weird if we showed you the post stabilized video because if there’s even just a quarter of a second of lag as you move your camera around it feels weird.

So one option we could’ve done is turned off video stabilization if you’re in sketch mode but that felt like compromising user experience, and you have to deal with different states of the app. So another option we explored was we create instances of the same sketch engine. One that is rendering to an OpenGL layer on screen with a transparent background, and that gets composited by core animation compositor on top of the av video preview layer. So we have two instances running: the mostly set up GLKit, and the final renderer that’s being run inside of the video processing stack that’s processing the same points.

OpenGL Multithreading (31:02)

Now we get into GL multithreading. As I said earlier, OpenGL is not really designed around multithreading. In fact, you can only have one OpenGL context active per thread. So you say, set the current threads OpenGL context to this off-screen rendering context or on-screen rendering context, I’m about to operate on that.

Now here’s the twist.

You may not know this, but Grand Central Dispatch serial cues are not dedicated threads. They only guarantee that they’re going to operate in serial order. They don’t document this much, but if it decides it’s optimal to execute the next work item on the main thread, it will run on the main thread and not that separate thread that it’d been working on 99% of the time. Because you’re mostly working with one OpenGL context for your whole app, this usually doesn’t become a problem. But we found a race condition where sometimes it would execute certain OpenGL that should be destined for GPUImage on the main thread where the preview layer was executing. It would mess up this crazy OpenGL state machine and delete shader programs that should be sitting on the CPU and then, you know, you’d waste two days trying to debug multithreading issues. So we ended up having to staple on a bunch of protective code to make sure that we’re operating on the right OpenGL context.

Performance (32:30)

The second real-world issue, of course, is performance. Of course, everyone here has an iPod touch fifth gen sitting on your desk. So before you ship your code, you plug it in and see how it performs in the real world. You find the oldest hardware that you support and run it on a real device. Now mind you this is going to be exacerbated because I was also recording to my iMac and I was hooked up to Xcode, so it’s going to be a little slower.

Under a lot of heart load, because of both the hearts as they’re being spit out and the interpolation code where it’s blending the different vertices in-between, they were wrestling for control of the main thread and it would drop touch events. It wasn’t quite as bad as the example in the video, but it did highlight an issue we were running into. And the best way to optimize always is to start by measuring. So you drop in Instruments and see where your bottleneck is. And there were two areas we optimized:

  1. The amount of time it takes to interpolate the different points, touch points so we tried to move that off into the background thread.
  2. We double backed and look at the heart generation code. It took about 2.4 milliseconds on the oldest hardware to spit out each of those hearts. And so diving into it, it turns out using UIKits animation with spring dampening is relatively expensive. By moving that over into a prebaked spring animation, we were able to drop it down to 1.4 milliseconds, it was around, 40% improvement, which gives us more headroom and a little less wrestling on the main thread.

Epilogue (34:01)

So, that’s the brunt of all the work. It launched in April and everyone seemed to love it. And so we were asking ourselves, based on that learning and where we want to go in the next year, what’s the next step?

Well, again, GPUImage is an awesome framework. You should all be using it rather than writing your own OpenGL. Unfortunately, it’s about 29,000 lines of code and half of it is filters that we don’t use. So a lot of the GPU image code base 1.0, we’re not looking at 2.0 because it’s written in Swift, and Swift isn’t at the point where we’re comfortable using a Swift library in the app. So a lot of the code in the 1.0 line has hardware compatibility checks. Like checking can you use what’s known as red texters or do you support this version of OpenGL? And it does a lot of stuff that’s convenient for a generic image processing framework like resource pooling. So it reuses textures and is more resource efficient. And it does a lot of stuff like rendering on screen for you.

But the problem is, any generic framework is not going to perform as well as something you optimized for one particular use case. So far I spent about a week and a half playing around with, what’s the smallest surface area we can build to support what we were using GPUImage for? It comes out to about 1500 lines of code, and we found, on older hardware, we’ve reduced resource utilization from 29% to 4%. On newer hardware, we’re also doing video stabilizations, so iOS is spending a little more resource utilization, but as far as some resources our app is using, we seen it somewhere drop. More importantly, there’s a smaller surface area to understand.

By inserting our code inside a GPUImage, we’re messing around with a bunch of state that the GPUImage authors are making assumptions you’re not touching. So we have to leave everything just as we found it or else we could mess things up as later parts of the GPUImage pipeline operate. We have investigated Metal, and it is awesome. However, not all of the iOS hardware we support is compatible with Metal. Likewise, it’s not compatible with OpenGL ES 3.0, which it could give us some more performance improvements but not all the hardware we support allows that.

I’d like to give a shout out to everyone who was involved in the project, from PMs to engineering management, to the people who dove through my code in code review and the design team. It certainly isn’t just a one-person project, and a lot of people were involved.

Q&A (36:54)

Q: In the beginning of your talk, you were talking about how you would approach building such a large feature like this, prototype, minimum questions, answers, etc. Do you have any recommendations on reading material for planning out big features like this?

BS: So Johnathan Blow, who developed Braid which is an awesome game, for those of you who haven’t played it, it involves like a time travel mechanic which is kind of out there. And he gave a talk I think shortly around the time that the game came out where he showed the various prototypes he went through including a completely different version that evolved into Braid. And so, I would say looking at what game developers do would be one way of stealing from other fields. Like even paper prototypes. Apple gave a talk on designing for future hardware. They showed some of their thought process and they’re one of the best design teams in the world.

But yeah, I’d say that in general cross-pollination with other software development fields like game development, would be an awesome thing you could bring back to general product development.

Q: Why did you decide to use a color picker instead of a color wheel?

BS: I am certainly not the product person who made this decision, so I don’t want to explain all the deliberation that went into it. I will say that you’re never going to please 100% of people, and you need to balance it out with the overall feeling of your product. And I’m sure that they had tons of different colors they had gone through before they decided these are the most neutral and these are the most versatile. I don’t know exactly why they arrived on those three, but I think that they’re talented designers, so they probably put a lot of thought into it.

Q: Are you guys using when you’re actually compressing it into video, mp4, or are you guys using similar kind of style as GPUImage or are you writing your own code there? What’s your pipeline look like for that?

BS: Sure, are we using our own pipeline for compressing the image? So, I don’t handle that part of the video stack. Gurant who’s really really amazing video engineer does a lot of that. But my understanding is there’re a couple of different broadcast video formats that we use. There’s very low latency and then on certain devices where it’s not available, we use high latency. So I believe there is an element. Don’t quote me on it, but I believe that for the lower latency stuff, it’s not available in AVFoundation. But a lot of what we are leveraging is inside of AVFoundation.

Q: What was the thought process that went into choosing to bake in the drawing into the video?

BS: That’s a really good question, and a lot of thought was given to it. It’s kind of like, wouldn’t it be perfect to have the best quality, kind of like how they remastered the Beatles records every few years cause they have the original tapes. And one decision is ultimately the version of video that’s being uploaded it’s still 320 by 568. Also even with that vector example we were using, it used CAShapeLayer. A friend of mine who works in visual effects said: “The secret of really good visual effects is never leave a clean spot.” If you have one really awesome perfect 3D object, it’s going to stand out and look fake.

So in a way, by baking into video and sharing the same artifacts, it feels more organic and goes with the product. And also, again, you know, there’s just so many questions around the distribution element and, are you going to take advantage of that now or, you know, there’s always the possibility down the road we could rewrite this. But it just seemed as though this was the simplest thing that worked and provided the most reach to broadcasters. So it wasn’t a five-minute decision.

Q: How long was it from your initial idea or whatever to the actual in the app and do you deal with Android and the GPU frameworks over there?_

BS: I was not working on this 40 hours a week. I do other work, so it’s hard to give a one to one. But I started work with a prototype in early February about one day a week, and it shipped at the end of April. I would estimate it was less than four weeks of development time from writing the first hacky prototype to actually shipping it into production. I’m cheating for two reasons cause I’m kind of compressing what I was doing part time. Also, I had this great advantage where I was off doing my thing. I didn’t have to sit in on any meetings; I didn’t even have to sit an open office plan. But I would say that as far as a number of hours in a perfect world, you could it was less than month of development time.


About the content

This talk was delivered live in June 2016 at AltConf. The video was recorded, produced, and transcribed by Realm, and is published here with the permission of the conference organizers.

Ben Sandofsky

Ben Sandofsky builds apps, advises startups, and teaches with CodePath. He has shipped software for over a decade, from tiny startups to giant enterprise companies. He spent over four years at Twitter; among other projects, he was tech lead for Twitter for iPhone, iPad, and Mac. Last year he was a technical consultant for HBO’s Silicon Valley.

4 design patterns for a RESTless mobile integration »